How do I align my partition table properly?

64,804

Solution 1

A friend of mine pointed out that I can just mkfs.ex4 right on /dev/md1 without partitioning anything, so I deleted the partition and did that and it appears to be formatting now.

Solution 2

Since alignment pops up in a lot of places -

  • "Advanced Format" hard drives with 4k blocks
  • SSDs
  • RAID
  • LVM

- I'll expand a bit on the question.

Aligning partitions

"Linux on 4kB-sector disks" (IBM developerWorks) walks through the steps with fdisk, parted and GPT fdisk.

With fdisk:

sudo fdisk /dev/XXX 
c # turn off DOS compatibility
u # switch to sector units
p # print current partitions, check that start sectors are multiples of 8

# for a new partition:
n # new partition
<select primary/secondary and partition #>
first sector: 2048 
  # 2048 is default in recent fdisk, 
  # and is compatible with Vista and Win 7, 
  # 4k-sector disks and all common RAID stripe sizes

Aligning the file system

This is primarily relevant for RAID (levels 0, 5 and 6; not level 1); the file system performs better if it is created with knowledge of the stripe sizes.

It can also be used for SSDs if you wish to align the file system to the SSD erase block size (Theodore Tso, Linux kernel developer).

In the OP post mkfs apparently auto-detected the optimal settings, so no further action was required.

If you wish to verify, for RAID the relevant parameters are:

  • block size (file system block size, ex. 4096)
  • stripe size (same as mdadm chunk size, ex. 64k)
  • stride: stripe size / block size (ex. 64k / 4k = 16)
  • stripe-width: stride * #-of-data-disks (ex. 4 disks RAID 5 is 3 data disks; 16*3 = 48)

From Linux Raid Wiki. See also this simple calculator for different RAID levels and number of disks.

For SSD erase block alignment the parameters are:

  • fs block size (ex. 4096)
  • SSD erase block size (ex. 128k)
  • stripe-width: erase-block-size / fs-block-size (ex. 128k / 4k = 32)

From Theodore's SSD post.

Aligning LVM extents

The potential issue is that LVM creates a 192k header. This is a multiple of 4k (so no issue with 4k-block disks) but may not be a multiple of RAID stripe size (if LVM runs on a RAID) or SSD erase block size (if LVM runs on SSD).

See Theodore's post for the workaround.

Solution 3

I find this way to be the easiest

parted -a opt /dev/md0
(parted) u MiB
(parted) rm 1
(parted) mkpart primary 1 100%

or an alternate dirty method would simply go like this

(parted) mkpart primary ext4 1 -1

Solution 4

It seems like mkfs.ext4 wants filesystems on your RAID to start on a 64 KiB boundary. If you use the whole disk, it starts at 0 which is of course also a multiple of 64 KiB...

Most partitioning tools nowadays will use a 1 MiB boundary by default anyway (fdisk probably doesn't).

The reason for this is that most hard disks & SSDs use fysical sectors on the device that are much bigger than the logical sectors. The result of that is that if you read a logical sector of 512 bytes from disk, the hardware actually has to reads a much larger amount of data.

In case of your software RAID device something similar happens: data on it is stored in "chunks" of 64 KiB with the default mdadm settings.

Share:
64,804

Related videos on Youtube

Jorge Castro
Author by

Jorge Castro

Updated on September 17, 2022

Comments

  • Jorge Castro
    Jorge Castro almost 2 years

    I am in the process of building my first RAID5 array. I've used mdadm to create the following set up:

    root@bondigas:~# mdadm --detail /dev/md1
    /dev/md1:
            Version : 00.90
      Creation Time : Wed Oct 20 20:00:41 2010
         Raid Level : raid5
         Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
      Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
       Raid Devices : 4
      Total Devices : 4
    Preferred Minor : 1
        Persistence : Superblock is persistent
    
        Update Time : Wed Oct 20 20:13:48 2010
              State : clean, degraded, recovering
     Active Devices : 3
    Working Devices : 4
     Failed Devices : 0
      Spare Devices : 1
    
             Layout : left-symmetric
         Chunk Size : 64K
    
     Rebuild Status : 1% complete
    
               UUID : f6dc829e:aa29b476:edd1ef19:85032322 (local to host bondigas)
             Events : 0.12
    
        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       8       32        1      active sync   /dev/sdc
           2       8       48        2      active sync   /dev/sdd
           4       8       64        3      spare rebuilding   /dev/sde
    

    While that's going I decided to format the beast with the following command:

    root@bondigas:~# mkfs.ext4 /dev/md1p1 
    mke2fs 1.41.11 (14-Mar-2010)
    /dev/md1p1 alignment is offset by 63488 bytes.
    This may result in very poor performance, (re)-partitioning suggested.
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    Stride=16 blocks, Stripe width=48 blocks
    97853440 inodes, 391394047 blocks
    19569702 blocks (5.00%) reserved for the super user
    First data block=0
    Maximum filesystem blocks=0
    11945 block groups
    32768 blocks per group, 32768 fragments per group
    8192 inodes per group
    Superblock backups stored on blocks: 
            32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
            4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
            102400000, 214990848
    
    Writing inode tables: ^C 27/11945
    root@bondigas:~# ^C
    

    I am unsure what to do about "/dev/md1p1 alignment is offset by 63488 bytes." and how to properly partition the disks to match so I can format it properly.

  • j-g-faustus
    j-g-faustus almost 11 years
    @Marco How so? The first one, to IBM Developer Works, even has a benchmark graph of the write performance penalty for using unaligned partitions, and a sidebar on RAID. The blogpost by Tso on SSD alignment has moved at least twice since I wrote this. Updated the link again, but there's no guarantee it will keep working.
  • j-g-faustus
    j-g-faustus almost 11 years
    Alternative link on SSD: Aligning SSD partitions
  • Felipe Alvarez
    Felipe Alvarez over 10 years
    The parted documentation suggests using MB and GB, not MiB or GiB, if one wishes to allow parted to attempt to optimise partitions automatically.