OpenZFS Fixes Data Corruption Issue

OpenZFS Fixes Data Corruption Issue (phoronix.com) 39

Posted by EditorDavid on Saturday December 02, 2023 @10:34PM from the bad-bugs dept.

A pull request has been merged to fix a data corruption issue in OpenZFS (the open-source implementation of the ZFS file system and volume manager). "OpenZFS 2.2.2 and 2.1.14 released with fix in place," reports a Thursday comment on GitHub.

Earlier this week, jd (Slashdot reader #1,658) wrote: All versions of OpenZFS 2.2 suffer from a defect that can corrupt the data. Attempts to mitigate the bug have reduced the likelihood of it occurring, but so far nobody has been able to pinpoint what is going wrong or why.

Phoronix reported on Monday: Over the US holiday weekend it became more clear that this OpenZFS data corruption bug isn't isolated to just the v2.2 release — older versions are also susceptible — and that v2.2.1 is still prone to possible data corruption. The good news at least is that data corruption in real-world scenarios is believed to be limited but with some scripting help the corruption can be reproduced. It's also now believed that the OpenZFS 2.2 block cloning feature just makes encountering the problem more likely.

OpenZFS Fixes Data Corruption Issue

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 39 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by The Evil Atheist ( 2484676 ) writes:
  
  ALL ot us (and them) deserve the audacious thought that data, once commited to the FS... stays that way.
  You don't, actually. No one deserves anything. That's why there are all these mitigating practices to increase the likelihood of detection of data corruption, if not outright recovering or repairing. No filesystem can ever guarantee complete protection against data loss.
  
  Guess what? The physical world is messy. The physical world is not software. Software is an abstraction on top of a lie about the stability of information. Every bit of hardware you own has defects of one kind or another that its firmware
  - Re:Damn that Hans guy (Score:4, Informative)
    
    by jd ( 1658 ) writes: <imipak@yahoGINSBERGo.com minus poet> on Sunday December 03, 2023 @08:48AM (#64050601) Homepage Journal
    
    It's true that the physical world is messy, so software that deals with data integrity is important.
    However, software only gets you so far, which is why you should absolutely use ECC RAM on critical machines, even if you're using filesystems like XFS or ZFS. Corruption in RAM is very difficult for software to detect.
    For absolutely critical data, RAID modes (either Linux' RAID for XFS or ZFS' own built-in RAID variant) are also advised, because redundant hardware is better than crashed hardware.
    Arguably, you can improve hardware reliability. Backblaze consistently shows a large variability between manufacturers in terms of reliability, which suggests differing levels of enthusiasm for quality control. It's also doubtful any manufacturer is close to the limit on reliability, either, as they'll be trying to go for a balance - too much quality means uncompetitively expensive and lower repeat sales for drive replacement. However, the best you could theoretically get would still be a long way from perfect.
    https://www.backblaze.com/blog... [backblaze.com]
    
    - Re: (Score:2)
      
      by OrangeTide ( 124937 ) writes:
      
      many years ago I dealt with a bad RAM in multiple RAID controllers at work. If things are really critical then end-to-end ECC and redundant controllers with multiple paths to the same disks becomes an expensive requirement.
- - Re: (Score:2)
    
    by ctilsie242 ( 4841247 ) writes:
    
    Filesystem development is easy. Having a filesystem that is consistent, atomic, can recover data and metadata, works with files like swapfiles, maybe even having to worry about partitioning and multiple hosts working with data at the same time, is damn hard. People need filesystems to do a lot more work, including compression, encryption, deduplication, maybe even handle stuff on the logical partition or even RAID level.
    Take a look at btrfs. This filesystem took a long time, but it managed to get a huge
    - Re: (Score:2)
      
      by jd ( 1658 ) writes:
      
      BtrFS is certainly improving and one day I'll even try it again.
      For small and medium memory machines, BtrFS is better than ZFS, as ZFS eats memory for breakfast. It is incredibly resource-intense. BtrFS is probably faster for some use-cases, as ZFS is slow.
      BtrFS is also easier to use, as ZFS uses its own logical volume manager and RAID software, it doesn't use the standard kit.
      Really, filesystems choice is down to use-case. I still use different partitions for different parts of my Linux system and choose t
      - Re: (Score:2)
        
        by Temkin ( 112574 ) writes:
        
        as ZFS eats memory for breakfast.
        I believe the old IBM quote applies here... "If you want to eat Hippopotamus, you have to pay the freight."
        
        Filesystems tradeoffs (Score:2)
        
        by jd ( 1658 ) writes:
        
        That's fair enough. However, Linux developers over the years have discovered that - usually - it's cheaper on resources to stack very simple layers to produce a complex result, rather than to have one very complex do-everything layer.
        I emphasise usually, because that's not always the case. dm-integrity is a relatively simple layer that adds extensive integrity checking to associated block devices. It is, however, incredibly slow. This might improve over time, or it might be that the right logic was added to
        
        Re: (Score:3)
        
        by Temkin ( 112574 ) writes:
        
        However, Linux developers over the years have discovered
        
        ZFS came from Sun Microsystems Solaris Unix. These are developers that created a certified 5 Nine's Oracle cluster back in 1995 that could have the nodes & half the storage separated by 3km of fiber. When they closed what is now Meta's campus, they found an un-patched Solaris machine with a 5 year uptime. And SMF vs. Systemd? Definitely not Linux developers, not even close...
        
        Re: (Score:1)
        
        by John-after-logtime ( 6156490 ) writes:
        
        And that 3km cluster came from DEC VAX/VMS
        
        Re: (Score:2)
        
        by jd ( 1658 ) writes:
        
        What does that have to do with the point that multiple light layers can sometimes produce the same complexity at lower cost than a heavy layer that tries to do everything?
        We already know from the trouncing RISC gave CISC (to the point that later designs of the x86 and x64 architectures were all complex instructions being simulated by a much simpler core) that simple can be better. The skill of Sun has little to do with whether complex is better than layered simple.
        Back when Sun developed ZFS, the battle ove
        
        Re: (Score:2)
        
        by Temkin ( 112574 ) writes:
        
        Back when Sun developed ZFS, the battle over simple vs complex was not yet over
        Who says its over now? Have you read the NVMe 1.4 spec?
      - Re: (Score:2)
        
        by ctilsie242 ( 4841247 ) writes:
        
        This, exactly. For really low RAM applications, ext4, or even ext3... maybe ext2 if a journal isn't needed and FS is read-only. For the main root filesystem for an average desktop, btrfs is a solid choice. XFS fits there somehow, because applications like MinIO have it as a best practice because it is quick, and the checksumming/self healing of data. ZFS is definitely good.
        Overall, I'm just glad we have a choice, and the filesystems available are quite robust.
      - Re: (Score:2)
        
        by darkain ( 749283 ) writes:
        
        "ZFS eats memory for breakfast"
        This is a misconception that's lived for far too long. I actively run OpenZFS on 1GB Raspberry Pis just fine. OpenZFS has its own filesystem cache, the ARC, so it is reported as "used" RAM, instead of "cache" RAM by the OS, but also has the ability to evict the ARC if there is memory pressure, just like the OS does with its native cache. If you used an OS like FreeBSD, it properly and accurately reports the ARC separately from "used" RAM so it isn't a concern mentally, as you'
        
        Re: (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        Its only if you have dedup turned on that zfs eats ram.
  - Re: (Score:2)
    
    by jmccue ( 834797 ) writes:
    
    Filesystem development is the most difficult thing any programmer can do. Try it.
    Yes, but I say Payroll development is far more dangerous for a programmer
- Re: (Score:2)
  
  by bill_mcgonigle ( 4333 ) * writes:
  
  > There's no discussion of ZFS without discussing when you muder your wife
  How drunk are you right now?
  - - Re: (Score:2)
      
      by Some nick or other ( 4033849 ) writes:
      
      That dude was right though, ReiserFS is a killer.
      Even before Reiser himself destroyed its reputation, ReiserFS was known for not caring much about integrity. Its fsck would scan the disk for its b-tree signature since it could be anywhere on disk. And if you had a disk image in there somewhere, fsck would get confused and turn into something more in the vein of mkfs.
  - Re: (Score:2)
    
    by NoWayNoShapeNoForm ( 7060585 ) writes:
    
    > There's no discussion of ZFS without discussing when you muder your wife
    How drunk are you right now?
    More like stoned ... I think.
- Re: (Score:2)
  
  by backslashdot ( 95548 ) writes:
  
  Now might be a good time to speak your doctor about upping your medication.
- Re: (Score:2)
  
  by NoWayNoShapeNoForm ( 7060585 ) writes:
  
  DUDE ... what are you smoking and where can I get some of it?
- Re: (Score:2)
  
  by jd ( 1658 ) writes:
  
  ReiserFS is not ZFS or Btrfs. ReiserFS is scheduled for demolition and I suspect Reiser4 will follow. I only ever used ReiserFS for \tmp because it handles small files very fast (and that's mostly what you have in the temporary directory) but screws up on data integrity (which is fine in a directory you delete on reboot). However, there's other filesystems I can use there.
  XFS is apparently the default on Red Hat, which is puzzling as Red Hat are owned by IBM and IBM wrote JFS. XFS was written by their compe
For sufficiently loose definitions of "fixed" (Score:2, Informative)

by 93 Escort Wagon ( 326346 ) writes:

"All versions of OpenZFS 2.2 suffer from a defect that can corrupt the data. Attempts to mitigate the bug have reduced the likelihood of it occurring, but so far nobody has been able to pinpoint what is going wrong or why."
They've cut down on how often it can occur... which is certainly a good thing! But the problem is definitely NOT fixed and the headline is WRONG.
- Re:For sufficiently loose definitions of "fixed" (Score:5, Informative)
  
  by bill_mcgonigle ( 4333 ) * writes: on Saturday December 02, 2023 @11:10PM (#64050185) Homepage Journal
  
  They found the problem and fixed it ( two codepaths need checking for dirty) and they've not been able to reproduce the error with significant testing (on systems with 2-minute reproducers before the fix).
  See: https://github.com/openzfs/zfs... [github.com]
  > But the problem is definitely NOT fixed and the headline is WRONG.
  What is your basis for this assertion?
  
  - Re: (Score:1)
    
    by 93 Escort Wagon ( 326346 ) writes:
    
    What is your basis for this assertion?
    You mean besides the part of the summary that I quoted - where it says "Attempts to mitigate the bug have reduced the likelihood of it occurring"?
    That doesn't sound like a fix to me.
    - Re: (Score:3, Informative)
      
      by JBeretta ( 7487512 ) writes:
      
      Reading the discussion thread on github, it would appear the developers are fairly certain it is fixed. I've seen lots of folks reporting in that systems that were affected by the bug are no longer exhibiting the problems after patching.
      Is someone going to declare that it is, without any doubt, fixed? Probably not.. But hundreds of tests, and a very detailed explanation of EXACTLY where the problem was occurring (some sort of race condition?) along with code to fix the errant behavior seem to show it is
    - Re: (Score:3)
      
      by _merlin ( 160982 ) writes:
      
      The "attempts to mitigate the bug" happened prior to the reliable reproduction script and this fix. People were recommending various configuration changes and things, but none of it reliably stopped the data corruption. This fix is an actual fix.
Coreutils 9.2 (Score:5, Informative)

by bill_mcgonigle ( 4333 ) * writes: on Saturday December 02, 2023 @11:05PM (#64050181) Homepage Journal

From
https://github.com/openzfs/zfs... [github.com]
The incorrect dirty check becomes a problem when the first block of a file is being appended to while another process is calling lseek to skip holes. It can happen that the dnode part is undirtied, while dirty records are still on the dnode for the next txg. In this case, lseek(fd, 0, SEEK_DATA) would not know that the file is dirty, and would go to dnode_next_offset(). Since the object has no data blocks yet, it returns ESRCH, indicating no data found, which results in ENXIO being returned to lseek()'s caller.
Since coreutils 9.2, cp performs sparse copies by default, that is, it uses SEEK_DATA and SEEK_HOLE against the source file and attempts to replicate the holes in the target. When it hits the bug, its initial search for data fails, and it goes on to call fallocate() to create a hole over the entire destination file.
This has come up more recently as users upgrade their systems, getting OpenZFS 2.2 as well as a newer coreutils. However, this problem has been reproduced against 2.1, as well as on FreeBSD 13 and 14.
It looks like bookworm has coreutils 9.1.
Somebody said RHEL9 backported the 9.2 behavior to coreutils 8.
Gentoo blokes get all the goodies and all the gremlins right away and seem to have been the canaries on this one. TYFYS.

ZFS development (Score:2)

by jd ( 1658 ) writes:

ZFS development has, in recent years, moved from FreeBSD to Linux. There is, apparently, no way to dual-license it, at least yet.
But its development on Linux matters, because that increases the number of eyes that can look at it whilst in development, so (in principle) accelerating that development.
More eyes didn't prevent this corruption bug, but it may well have reduced the time the bug has been allowed to survive, which is a Good Thing.
ZFS will be increasingly important, I suspect, over the next few year
- Re: (Score:2)
  
  by Bongo ( 13261 ) writes:
  
  I'm just curious, where and how do data centres protect against bit rot or similar issues?
  - Re: (Score:2)
    
    by Entrope ( 68843 ) writes:
    
    It depends on the data center and the scale of the system being deployed. Some applications use erasure codes, but those work better with multiple computers cooperating to increase redundancy. Some systems use RAID with multiple parity disks, but that requires the whole array to be in one physical location. Some places abstract those things behind an API that provides a file, block or object store.
Corrupted ZFS Data and one half of a mirror (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Corrupted ZFS Data [oracle.com]

“Data corruption occurs when one or more device errors (indicating one or more missing or damaged devices) affects a top-level virtual device. For example, one half of a mirror can experience thousands of device errors without ever causing data corruption. If an error is encountered on the other side of the mirror in the exact same location, corrupted data is the result.”
- Re: (Score:2)
  
  by jd ( 1658 ) writes:
  
  Oracle ZFS and OpenZFS diverged after Oracle naively closed the license. OpenZFS, I believe, has RAID modes not available in Oracle ZFS. Whether there are other protections present is unclear to me, but it seems safe to say that Oracle's ZFS documentation is reasonably valid for now but you're still advised to double check claims against up-to-date OpenZFS docs.
Please be aware (Score:2)

by jd ( 1658 ) writes:

The bug exists in Oracle Solaris' ZFS as well. There's no indication as to whether they've fixed their version or not.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenZFS Fixes Data Corruption Issue (phoronix.com) 39

OpenZFS Fixes Data Corruption Issue More Login

OpenZFS Fixes Data Corruption Issue

Re: (Score:2)

Re:Damn that Hans guy (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Filesystems tradeoffs (Score:2)

Re: (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

For sufficiently loose definitions of "fixed" (Score:2, Informative)

Re:For sufficiently loose definitions of "fixed" (Score:5, Informative)

Re: (Score:1)

Re: (Score:3, Informative)

Re: (Score:3)

Coreutils 9.2 (Score:5, Informative)

ZFS development (Score:2)

Re: (Score:2)

Re: (Score:2)

Corrupted ZFS Data and one half of a mirror (Score:2)

Re: (Score:2)

Please be aware (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot