2015/12/17

Fixing the FreeBSD ashift (4k) problem

Originally sourced and reformatted from Fixing the ZFS ashift problem on FreeBSD

The blog appears long-time unmaintained and looks like crap where it was previously very readable, I doubt the author will mind that another usable copy with better formatting floating around.

As you’re probably aware (since you’re reading this), some of the new hard drives with “Advanced Format” lie about their sector size and thus the partitioning and/or file systems end up unaligned with the disk and performance suffers. Boo, Western Digital. You guys are worse than Hitler. But, since the WD20EARS was cheap and I didn’t think anyone would design something so stupid, I bought it.
Of course, ZFS is smart enough to use the larger sector size through the “ashift” setting, but only if it actually knows the sector size of the drive. This option is set at pool (or rather VDEV) creation and can’t be changed.
The easy way around this is to use gnop to prevent the drive from lying.

# zpool create tank raidz ada0 ada1 ada2
# zdb | grep ashift
       ashift: 9

Oe noes! 2^9 is 512, the wrong sector size. Let’s try that again.
 
# zpool destroy tank
# gnop create -S 4096 ada0
# zpool create tank raidz ada0.nop ada1 ada2
# zdb | grep ashift
       ashift: 12

2^12 is 4096, as it should be. Note that you should only need to tag one of the drives.  zpool is clever enough to use the largest size of them all. Also note that you need to do this every time you add a new VDEV, otherwise you’ll have varying ashift values within your pool. Only problem now is that the drive list is a bit ugly, but that’s easily fixed.

# zpool status
pool: tank
 state: ONLINE
 scan: none requested
config:
        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0     0
          raidz1-0     ONLINE       0     0     0
            ada0.nop   ONLINE       0     0     0
            ada1       ONLINE       0     0     0
            ada2       ONLINE       0     0     0
# zpool export tank
# gnop destroy ada0.nop
# zpool import tank
# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
             ada0   ONLINE       0     0     0
             ada1   ONLINE       0     0     0
             ada2   ONLINE       0     0     0

There we go. Destroying the gnop node just removes the sector size override, the data is safe and sound. You’ve now got ashift set right without fiddling with patches and such.

Creation Example

  tanker: ~ # bash
  [root@tanker ~]# gnop create -S 4096 ada0
  [root@tanker ~]# gnop create -S 4096 ada1
  [root@tanker ~]# zpool create gozer mirror ada0.nop ada1.nop
  [root@tanker ~]# zpool status -v
    pool: gozer
   state: ONLINE
    scan: none requested
  config:
  
   NAME          STATE     READ WRITE CKSUM
   gozer         ONLINE       0     0     0
     mirror-0    ONLINE       0     0     0
       ada0.nop  ONLINE       0     0     0
       ada1.nop  ONLINE       0     0     0
  
  errors: No known data errors
  [root@tanker ~]# zpool export gozer
  [root@tanker ~]# gnop destroy ada0.nop
  [root@tanker ~]# gnop destroy ada1.nop
  [root@tanker ~]# zpool import gozer
  [root@tanker ~]# zpool status -v
    pool: gozer
   state: ONLINE
    scan: none requested
  config:
  
   NAME        STATE     READ WRITE CKSUM
   gozer       ONLINE       0     0     0
     mirror-0  ONLINE       0     0     0
       ada0    ONLINE       0     0     0
       ada1    ONLINE       0     0     0
  
  errors: No known data errors