Tuesday, January 13, 2009

Btrfs - Next generation file system for linux

Source: btrfs homepage

The most recent development version of the Linux kernel (2.6.29-rc1) included a pleasant surprise: the development version of btrfs has been added to the kernel source tree.

There has been quite a lot of filesystem work done in the past few kernel versions. The ext4 file system was added as a development version in late in 2006 (Initial git commit) and was declared stable exactly two years later. (commit to rename ext4dev to ext4) ext4 is a nice evolution of the ext series of file systems and provides a number of new features such as larger storage limits, and extent based storage. See the wikipedia article on ext4 for more detail. But ext4 is still just a file system and doesn't really compete with the newest kid on the block, ZFS.

ZFS is the feature rich, "rampant layering violating" file system that Sun has developed for it's version of UNIX, Solaris. ZFS has a few key features that make linux users green with envy: quick and easy snapshots of data, data checksumming, and integration of several tools to manage disks and file systems.

Snapshots: ZFS is based upon a Copy-on-write architecture that basically writes a new copy of the data every time it changes. Once the new version of the data is written the old version is marked as deleted and the space can be reclaimed. To implement a snapshot system on top of this is trivial, all you do is instruct the operating system to not mark the old data as deleted and and changes are preserved. This also means that snapshots do not occupy any more space then just the difference between the two versions.

Checksumming: All data that is written to a ZFS file system is checksummed to ensure it's validity. Lately it has become more critical that hardrives can silently corrupt data. This has always been an issue but due exponential growth in storage requirements corrupted data is much more common. To help mitigate the risk of silent data corruption ZFS stores a checksum of all the data it stores and validates the data again before relaying it onto the operating system. If one copy of the data has been corrupted it is identified on read and seamlessly copied from another source.

Integration of several tools: If you have a server with several hard drives in it you potentially use 3-5 different utilities to abstract away the fact that your data is stored on multiple disks. You would have a RAID setup to make several hard drives appear as one larger, or redundantly stored disk. You would have an Logical Volume created on that disk so you have the flexibility to grow/shrink/move the volume(s), and finally you would have your choice of filesystem setup in to actually allow the os to access the data. On top of this all you may have a backup system implemented so that you can roll back to previous versions of the data (snapshots), and possibly a method to identify and resolve data corruption (checksumming). ZFS unites all this functionality under one system that is more flexible, and easier to manage.

For just the three features listed above you can see why linux users would want to make have ZFS. But there is an issue. ZFS is free software, but it is not compatible with the GPL licence of the linux kernel. ZFS code (and a binary implementation in OpenSolaris) is freely available but it is released under Sun's GPL incompatible CDDL licence. Sun has released a few of their major software projects under GPL and GPL compatible licences (OpenOffice, Java), but as ZFS seems to be a major technological advantage for Solaris it doesn't seem like it will be re-licenced anytime soon.

So with the licencing incompatibility of ZFS and the "that just makes sense" factor of the feature set, Chris Mason started to work on a GPL licenced competitor to ZFS. It's called Btrfs (B-Tree FS, or Butter FS) and it implements most of the major features of ZFS. It has snapshots, checksumming, multiple hard drive support, and a few other features. Btrfs was announced on June 12, 2007 (lkml post) and already implemented a number of the key features. After over a year of development it has been merged into the linux kernel as brtfs-unstable (git commit) and will be part of the general 2.6.29 kernel release. It is still an experimental file system and not ready for production use yet, but having the code in the main kernel tree is major step on the way to becoming a workable filesystem.

Btrfs feature set is squarely aimed at ZFS, but it does have big shoes to fill. ZFS has been in continuous development for over 4 years now and was included as a fully supported production ready filesystem in the 6/06 (june 2006) update to Solaris 10.

Linux development continues on at a breakneck pace...

Related:
Oracle Presentation with Benchmarks
Article on Snapshots & Subvolumes
Wikipedia Article on btrfs
Diagrams showing RAID Levels
btrfs gitweb
btrfs utilities gitweb
The last word on ZFS