Gentoo ZOL on boot and root
- Grubbin'…
- Whole Disks vs Partitions
- Geronimo…
- Jump Into the Pool With Me Tonight!
- After Party
- Okay, Moving Along Here…
- Mad Props!
I want native ZFS goodness on Linux so I spent some time exploring. Discerning current from more dated best practices proved a bit challenging so I found myself bumbling about a bit. I expect this situation to be rectified as uptake of ZFS On Linux continues, ZOL stablizes, and comprehensive best practices are ferreted out. Follows are some personal notes to self and reflections documenting my adventures. End objective is a minimally installed base Linux suitable for further build out as a personal workstation. But first we’ve got some choices to understand.
"The perfect is the enemy of the good."
Grubbin'…
Ideally we’d like to have /boot and / share the same zpool, at least from management perspective of keeping our kernels and user lands in sync. Alas, ZFS on Linux development is still moving towards a 1.0 release. In the meantime, it’s a rapidly moving target. At least compared to GRUB. Yes, GRUB supports ZOL. Sort of… GRUB’s devs are necessarily more conservative about merging ZOL patches into something as important as a boot loader/manager. Hence, running ZFS single pool for / and /boot datasets can be problematical: While we want latest ZOL for features, enhancements, bug fixes, etc, on the other we need to ensure maximum reliability from GRUB.
One solution is to use grub-git. ZOL developers are invested in keeping grub patched for ZFS so you’ll get at or near complete ZFS feature support. If such is an important target for you then this approach may be attractive. The downside is that it’s grub-git. Maybe you don’t like living dangerously.
Another solution, in contrast to above, priortizes reliability, is willing to forgo features, and is accomplished thru the use of carefully configured ZFS feature flags tuned to maximize GRUB release version compatibility and reliability. The downside to this approach is that you necessarily give up a lot of ZFS goodness by so limiting yourself.
For those who like having their cake and eating it too: A potentially best of both worlds solution is to break /boot and / out to separate pools. We now have the ability to create the boot pool using a limited feature set known to be well supported by GRUB. With this handled, we are now free to create a second pool for / and enjoy the full smorgasbord of ZOL features where it matters most. The downside is …. that this is far from perfect. Hmmm… methinks this offers a compromise good enough for my needs. Yours may differ.
Whole Disks vs Partitions
You may have heard that ZFS likes whole disks. And you would be correct. Using whole disks is preferable if interoperabiilty with other ZFS implementations that honor the "whole_disk" property is a concern. For example, on FreeBSD the whole_disk property is always set to true. This is ZOL on root and boot though, so we need to take a couple things into consideration and understand some compromises here.
ZFS does some performance tweaks when using whole disks. For example, on IllumOS based system, ZFS enables the write cache. On Linux, ZFS will set the I/O elevator to noop to avoid unnecessary CPU overhad. If using partitions it will not attempt to manage these optimizations and leave things as is. So using partitions with ZOL means we’re going to take a bit of a performance hit. Or are we?
The consensus I got on #zfsonlinux when inquiring about this, is that it would be fine to enable "elevator=noop" on a partitioned based set up like we’re using here. So feel free to tune your kernel boot parameters accordingly if you have performance concerns.
Geronimo…
Time to go! Almost. ZFS makes things dead easy. But we need the ability to use it. Classic chickegg and the egg thang! Not to fear, as fearedbliss maintains a nice Gentoo based system-rescue-cd-with-zfs Recommended. Grab it from a torrent near you. Then follow the yellow brick road until you have a bootable iso.
Archers are going to want to embed archzfs into an archiso. Archer Jesus Alvarez does a nice job packaging this stuff up for Arch.
Then head on back here ready to begin enjoying the bliss that is ZFS.
Jump Into the Pool With Me Tonight!
Okay, I like using real hardware when testing stuff. Yeah, I do know about virtual machines…. they have their use. I like bare metal. Get your self booted. I shall presume that you are able to set root passwd and get yourself ssh’d into toyland.
I will be using /dev/sda throughout this example. Make sure you’ve got the correct block device for your system else you may lose stuff you’d rather not…
root@sysresccd /root % lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 698.7G 0 disk
sdb 8:16 0 698.7G 0 disk
sdc 8:32 1 3.8G 0 disk
└─sdc1 8:33 1 511M 0 part
sr0 11:0 1 1024M 0 rom
loop0 7:0 0 380.5M 1 loop /livemnt/squashfs
I’m going to be using gpt based partitions. The command line commandos among you may want to use sgdisk, in which case I’m quite sure you’re well familiar with man. I will use gdisk here. The attentive reader should be able to follow along and end up with something like this:
root@sysresccd / % gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.0
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Command (? for help): n
Partition number (1-128, default 1):
First sector (34-1465149134, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-1465149134, default = 1465149134) or {+-}size{KMGTP}: +4G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'
Command (? for help): n
Partition number (2-128, default 2):
First sector (34-1465149134, default = 8390656) or {+-}size{KMGTP}: +128M
Last sector (8652800-1465149134, default = 1465149134) or {+-}size{KMGTP}: +4M
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): ef02
Changed type of partition to 'BIOS boot partition'
Command (? for help): n
Partition number (3-128, default 3):
First sector (34-1465149134, default = 8660992) or {+-}size{KMGTP}: +128M
Last sector (8923136-1465149134, default = 1465149134) or {+-}size{KMGTP}: -350M
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): bf00
Changed type of partition to 'Solaris root'
Command (? for help): p
Disk /dev/sda: 1465149168 sectors, 698.6 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 1397BE73-4653-4449-9E52-1A5FC53905DE
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1465149134
Partitions will be aligned on 2048-sector boundaries
Total free space is 1243102 sectors (607.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 8390655 4.0 GiB BE00 Solaris boot
2 8652800 8660991 4.0 MiB EF02 BIOS boot partition
3 8923136 1464432334 694.0 GiB BF00 Solaris root
Command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sda.
The operation has completed successfully.
Cool. Let’s grab a copy of that table:
root@sysresccd / % sgdisk --backup=./sda-gpt-part.table
Transferring to somewhere that survives a reboot is left as an exercise for the reader ;)
To recap:
root@sysresccd / % lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 698.7G 0 disk
├─sda1 8:1 0 4G 0 part
├─sda2 8:2 0 4M 0 part
└─sda3 8:3 0 694G 0 part
sdb 8:16 0 698.7G 0 disk
sdc 8:32 1 3.8G 0 disk
└─sdc1 8:33 1 511M 0 part
sr0 11:0 1 1024M 0 rom
loop0 7:0 0 380.5M 1 loop /livemnt/squashfs
Time to make our boot zpool:
root@sysresccd / % zpool create -o version=28 -o ashift=9 -o cachefile= -m none \
-R /mnt/gentoo boot /dev/sda1
See man for details of setting version=28 flag, but basically we’ve just got a really dandy, nice shorthand here, for designating some nicely compatible grubbables….
Also please take note of the use of the ashift flag. If you know your drive is 4K, set ashift=12 We can check with a little help from smartmontools:
root@sysresccd / % smartctl -i /dev/sda | grep Sector
Sector Size: 512 bytes logical/physical
This drive uses both 512b logical and physical sectors. Hence ashift=9 is most appropriate. In the early days of 4K drives, drive manufacturers had to make their 4K drives lie in order to maintain compatibility with a widely deployed Windows XP base. In more modern times, XP is dead and drive manufactures no longer need to play this game so you’re more likely to get the truth. Still, confirm that your numbers for logical and physical are consistent.
Some are of the opinion that it is preferable to always use ashift=12 so as to maintain "forward compatibility" for the day when we’ve got our big boy pants on and upgrade to 4K drives. Well, although a potentially valid point, I’ve news for you: If and when that day comes, I’m going to be getting down and dirty at the hardware level anyways. I therefore advise: tune for what you’ve got happen' now, baby! ;D
But I digress! Let’s export that boot pool. Inquiring minds have questions? No worries. Shake it off. We’ll come back to it. All will become clear, young Padawan…
root@sysresccd / % zpool export boot
And now create our main root pool and export it:
root@sysresccd / % zpool create -o ashift=9 -o cachefile= -m none -O compression=lz4 \
> -R /mnt/gentoo freebird /dev/sda3
root@sysresccd / % zpool export freebird
I prefer to use distinctive names for my root pools, usually associated in some way at the bare metal hardware level. Here I’ve used "freebird". You are free to use whatever. On some systems, rpool and tank are imported automatically. This may not be what you want. If I ever plug these drives into another system, I want to be in control of what gets imported and where. I also like having unique names from a management perspective. Meh.. so I have to type a few extra commands…
We want to be using /dev/disk/by-id when we build up our system, so let’s reimport our pools as such now. I’ll run a few other commands that should be self explanatory. If not, please rtrm ;D
root@sysresccd / % zpool import -d /dev/disk/by-id -R /mnt/gentoo -Na
root@sysresccd / % zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
boot 3.97G 130K 3.97G - - 0% 1.00x ONLINE /mnt/gentoo
freebird 692G 95.5K 692G - 0% 0% 1.00x ONLINE /mnt/gentoo
root@sysresccd / % zpool status
pool: boot
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support
feature flags.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
boot ONLINE 0 0 0
ata-WDC_WD7500AYYS-01RCA0_WD-WCAPT0562110-part1 ONLINE 0 0 0
errors: No known data errors
pool: freebird
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
freebird ONLINE 0 0 0
ata-WDC_WD7500AYYS-01RCA0_WD-WCAPT0562110-part3 ONLINE 0 0 0
errors: No known data errors
root@sysresccd / % zfs list
NAME USED AVAIL REFER MOUNTPOINT
boot 104K 3.84G 29K none
freebird 74.5K 670G 19K none
I passed off an -N flag there to tell zfs not to mount the pools so I could take a look at them and demonstrate a couple other simple zfs commands. And/or in event I make a mistake/typo it’s easy just to destroy the ensuing cluster w/o having to worry about mopping up any actual mount points created on the file system.
Time to make some ZFS datasets:
root@sysresccd / % zfs create freebird/ROOT
root@sysresccd / % zfs create -o mountpoint=/ freebird/ROOT/gentoo
root@sysresccd / % zfs create -o mountpoint=/boot boot/gentoo
root@sysresccd / % zfs list
NAME USED AVAIL REFER MOUNTPOINT
boot 152K 3.84G 29K none
boot/gentoo 30K 3.84G 30K /mnt/gentoo/boot
freebird 132K 670G 19K none
freebird/ROOT 38K 670G 19K none
freebird/ROOT/gentoo 19K 670G 19K /mnt/gentoo
That should cover your basic bases. I like to manage via the use of container datasets. Moreover, I have a deep seated need to keep my home dirs extra cozy in the winter so:
root@sysresccd / % zfs create -o mountpoint=/home freebird/HOME
root@sysresccd / % zfs create -o mountpoint=/root freebird/HOME/root
The adventurous reader may deem it desirable, or even just too damn much fun, and feel irresistibly compelled to break out other datasets as you see fit for your needs, distro, etc. (cuz you is free, free, free baby!!) Follows is example of how I might prep a Gentoo workstation:
root@sysresccd / % zfs create freebird/GENTOO
root@sysresccd / % zfs create -o mountpoint=/var/portage freebird/GENTOO/portage
root@sysresccd / % zfs create -o mountpoint=/var/portage/distfiles freebird/GENTOO/distfiles
root@sysresccd / % zfs create -o mountpoint=/var/tmp/portage freebird/GENTOO/build-dir
So now we’ve got something like this:
root@sysresccd / % zfs list
NAME USED AVAIL REFER MOUNTPOINT
boot 152K 3.84G 29K none
boot/gentoo 30K 3.84G 30K /mnt/gentoo/boot
freebird 327K 670G 19K none
freebird/GENTOO 76K 670G 19K none
freebird/GENTOO/build-dir 19K 670G 19K /mnt/gentoo/var/tmp/portage
freebird/GENTOO/distfiles 19K 670G 19K /mnt/gentoo/var/portage/distfiles
freebird/GENTOO/portage 19K 670G 19K /mnt/gentoo/var/portage
freebird/HOME 38K 670G 19K /mnt/gentoo/home
freebird/HOME/root 19K 670G 19K /mnt/gentoo/root
freebird/ROOT 38K 670G 19K none
freebird/ROOT/gentoo 19K 670G 19K /mnt/gentoo
After Party
Oooh, la, la!!! So much ZFS fun. I’ve just decided to break out a couple more:
root@sysresccd / % zfs create -o mountpoint=/var/log freebird/GENTOO/log
root@sysresccd / % zfs create -o mountpoint=/var/cache freebird/GENTOO/cache
root@sysresccd / % zfs list
NAME USED AVAIL REFER MOUNTPOINT
boot 152K 3.84G 29K none
boot/gentoo 30K 3.84G 30K /mnt/gentoo/boot
freebird 399K 670G 19K none
freebird/GENTOO 114K 670G 19K none
freebird/GENTOO/build-dir 19K 670G 19K /mnt/gentoo/var/tmp/portage
freebird/GENTOO/cache 19K 670G 19K /mnt/gentoo/var/cache
freebird/GENTOO/distfiles 19K 670G 19K /mnt/gentoo/var/portage/distfiles
freebird/GENTOO/log 19K 670G 19K /mnt/gentoo/var/log
freebird/GENTOO/portage 19K 670G 19K /mnt/gentoo/var/portage
freebird/HOME 38K 670G 19K /mnt/gentoo/home
freebird/HOME/root 19K 670G 19K /mnt/gentoo/root
freebird/ROOT 39K 670G 19K none
freebird/ROOT/gentoo 20K 670G 20K /mnt/gentoon
Or not….
root@sysresccd / % zfs destroy freebird/GENTOO/cache
root@sysresccd / % zfs list
NAME USED AVAIL REFER MOUNTPOINT
boot 152K 3.84G 29K none
boot/gentoo 30K 3.84G 30K /mnt/gentoo/boot
freebird 399K 670G 19K none
freebird/GENTOO 114K 670G 19K none
freebird/GENTOO/build-dir 19K 670G 19K /mnt/gentoo/var/tmp/portage
freebird/GENTOO/distfiles 19K 670G 19K /mnt/gentoo/var/portage/distfiles
freebird/GENTOO/log 19K 670G 19K /mnt/gentoo/var/log
freebird/GENTOO/portage 19K 670G 19K /mnt/gentoo/var/portage
freebird/HOME 38K 670G 19K /mnt/gentoo/home
freebird/HOME/root 19K 670G 19K /mnt/gentoo/root
freebird/ROOT 39K 670G 19K none
freebird/ROOT/gentoo 20K 670G 20K /mnt/gentoo
It’s just that easy!
Okay, Moving Along Here…
Remember that -N flag we threw up in zpool import’s face? Let’s get our datasets mounted up and ready to ride:
root@sysresccd / % zfs mount -a
root@sysresccd / % zfs mount
freebird/ROOT/gentoo /mnt/gentoo
boot/gentoo /mnt/gentoo/boot
freebird/HOME /mnt/gentoo/home
freebird/HOME/root /mnt/gentoo/root
freebird/GENTOO/portage /mnt/gentoo/var/portage
freebird/GENTOO/distfiles /mnt/gentoo/var/portage/distfiles
freebird/GENTOO/build-dir /mnt/gentoo/var/tmp/portage
freebird/GENTOO/log /mnt/gentoo/var/log
Sweet! Chroot into e.g. /mnt/gentoo and you’re ready to rock it the rest of the way off into the sunset as per your distro of choices' installation instructions.
Mad Props!
In addition to the handy, dandy, awesome sauce rescue cd, I have also drawn inspiration from fearedbliss’s guide for installing Gentoo Linux On ZFS, particularly with regards to creating the boot and root pools. I also derive a lot of gentoo tuned zfs dataset management inspiration from ryao’s guide. Complimented, of course, by diligent study of the Gentoo Install Guide.
Tally Ho!!