Forcing zpool to use /dev/disk/by-id in Ubuntu Xenial

ubuntu zfs

30,748

Solution 1

I know this thread is sort of stale, but there is an answer. You need to update your cache file after you import. This example shows the default location for the cache file.

$> sudo zpool export POOL
$> sudo zpool import -d /dev/disk/by-id POOL
$> sudo zpool import -c /etc/zfs/zpool.cache
$> sudo zpool status POOL
NAME                                  STATE     READ WRITE CKSUM
POOL                                  ONLINE       0     0     0
  raidz1-0                            ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HUA722020ALA330_[..]  ONLINE       0     0     0

Solution 2

One in a while, zpool import -d /dev/disk/by-id doesn't work.

I've noticed this on more than one environment. I have an import script that, beyond also doing some magic logic and showing physically attached ZFS devices, also does basically this:

zpool import -d /dev/disk/by-id POOL
zpool export POOL
zpool import POOL

The second time around, even without the -d switch, imports by device ID even if it didn't the first time with the explicit command.

It's possible this was just due to a ZFS bug during a few week or month span of time (a year or two ago), and this is no longer necessary. I suppose I should have filed a bug report, but it was trivial to work around.

30,748

Ruben Schade

Long time lurker, but figured I may as well register. I'm a virtualisation ops guy in Sydney, Australia. Cheers.

Updated on September 18, 2022

Comments

Ruben Schade almost 2 years
I'm giving the bundled OpenZFS on Ubuntu 16.04 Xenial a try.

When creating pools, I always reference drives by their serials in /dev/disk/by-id/ (or /dev/disk/gpt on FreeBSD) for resiliency. Drives aren't always in the same order in /dev when a machine reboots, and if you have other drives in the machine the pool may fail to mount correctly.

For example, running zpool status on a 14.04 box I get this:
```
NAME                                  STATE     READ WRITE CKSUM
tank                                  ONLINE       0     0     0
  raidz1-0                            ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HDS722020ALA330_[..]  ONLINE       0     0     0
    ata-Hitachi_HUA722020ALA330_[..]  ONLINE       0     0     0
```
But when I create a new pool on 16.04 with this (abbreviated):
```
zpool create pool raidz \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..] \
    /dev/disk/by-id/ata-Hitachi_HDS723030ALA640_[..]
```
I get this with zpool status:
```
NAME        STATE     READ WRITE CKSUM
tank        ONLINE       0     0     0
  raidz1-0  ONLINE       0     0     0
    sdf     ONLINE       0     0     0
    sde     ONLINE       0     0     0
    sdd     ONLINE       0     0     0
    sda     ONLINE       0     0     0
```
It looks like zpool followed the symlinks, rather than referencing them.

Is there a way to force zpool on 16.04 to respect my drive references when creating a pool? Or alternatively, are my misgivings about what its doing here misplaced?

Update: Workaround

I found a thread for zfsonlinux on Github that suggested a workaround. Create your zpool with /dev/sdX devices first, then do this:
```
$ sudo zpool export tank
$ sudo zpool import -d /dev/disk/by-id -aN
```
I would still prefer to be able to do this with the initial zpool create though if possible.
- Admin about 8 years
  
  It doesn't matter how you create them. If it reverts to /dev/sd? device names, the zfs export and zfs import -d will work anyway. BTW, unless you really need every byte of space, use two mirrored pairs rather than raidz. raidz's performance is better than raid-5 but still much worse than raid-10 or zfs mirrored pairs. it's also easier to expand a pool made up of mirrored pairs, just add two disks at a time...with raidz, you have to replace each of the drives with larger drives, and only when you've replaced all of them will your pool have more space available.
- Admin about 8 years
  
  I still have some raid-z pools, and regret having made them. When i can afford to buy replacement disks, I'll create new pools with mirrored pairs and use zfs send to copy my data to the new pools. Actually, raid-z is OK for my mythtv box where performance isn't critical unless i'm running 6 or 8 transcode jobs at once. Changing to mirrored pairs would be very noticeable on the pool where my /home directory lives.
- Admin about 8 years
  
  oh, and add a pair of SSDs....partitioned to give a mirrored pair of smallish (4GB or so is plenty) log (i.e. ZIL or ZFS Intent Log) devices, and two large (remainder of the SSDs?), non-mirrored cache devices for L2ARC.
- Admin about 8 years
  
  @cas Keep in mind that log and cache devices have completely different usage patterns: the first is hit by a large amount of data and needs high endurance/TBW as well as low latency and power-loss protection capacitors, mirroring is optional for safety. The second one needs high read IOPs and mirroring is only useful for availability and not losing the cache (if you don't use Solaris 11 which has permanent L2ARC). I would suggest to split instead of mirror, so you get the best for each use case.
- Admin about 8 years
  
  The mirroring of ZIL is so you can get away with using ordinary cheap SSDs rather than expensive ones with large capacitors to guard against power-loss. IMO, mirroring of the ZIL is not optional, no matter what kind of SSDs you have - if your ZIL dies, you lose all the yet-to-be-written data in it and potentially corrupt your pool. As for L2ARC, i specifically said NOT to mirror them...mirroring the L2ARC cache is a waste of time, money, and good SSD space (and would do nothing to prevent losing the cache - where did you get that idea from?)
- Admin about 8 years
  
  A basic Q turned into a meta ZFS discussion hah, but some interesting advice, thanks. I usually use mirrored pairs, but this is a dumb backup HP MicroServer samba target where performance isn't an issue and money is tight. Works just fine.
- Admin about 8 years
  
  :) BTW, my brain wasn't working right when I explained the reason for mirroring ZIL. It's not to guard against power-loss, that's complete nonsense and i should never have said it. It's to guard against failure of the ZIL drive. i.e. raid-1 mirror for the ZIL. Two reasonably-priced SSDs are, in general, better than one extremely expensive one (unless the more expensive SSD has a much faster interface, like PCI-e vs SATA). and a UPS is essential...cheap protection against power-loss.
- Admin about 8 years
  
  @cas Mirrored ZIL protects against SLOG device failure at the same time as an unexpected shutdown. Under normal operations, the ZIL is write-only, and writes to persistent storage is from RAM (ARC). If the system shuts down unexpectedly, the intent log (ZIL, SLOG) is used to finish the writes that were interrupted. Only if the unexpected shut down coincides with failure of a SLOG device do you need redudant SLOG to recover the interrupted writes. For most non-server (and many server) workloads, a SLOG is overkill, as the ZIL really only comes into play with synchronous writes.
- Admin almost 4 years
  
  Does this work on ZFS root pools? I'm using Proxmox and they also use /dev/sda, etc...
Wouter almost 4 years

In your case you got disks named ata-... but in my case they're called wwn-.... I understand there's a nuance, but I find ata- to be more practical because the serial number is in the name. Can you tell how to switch wwn into ata ?
Steve O almost 4 years

I suppose the status names depend on how you imported the disks. The names in /dev/disk/by-id are assigned by the OS (Ubuntu in my case). How those names are derived is another subject.
Wouter almost 4 years

I read that you can move/delete the wwn* files and just run the command again. This time the ata* names will be discovered and used. Have not tried it yet.
Steve O almost 4 years

Hmmm... Sounds pretty sketchy to me. I wouldn't try that on a production environment. I would make a disposable VM for that little experiment