Matching between raw and APFS disk number

Hello,

I have a raw device number, as an input, for external usb/thunderbolt device (i.e. for /dev/disk9 - it is 9). And I want to format it to APFS or read/confirm APFS FS type if it is already formatted. But I can see that APFS container/volume have another disk number (i.e /dev/disk10) in compare with for my raw disk.

Is there a guaranteed way to match raw disk number to underlying APFS container/volume disk number? May be some API? Or just add +1 to raw disk number for getting APFS container/volume disk device number?

Answered by DTS Engineer in 854400022

I have a raw device number, as an input, for external usb/thunderbolt device (i.e. for /dev/disk9 - it is 9). And I want to format it to APFS or read/confirm APFS FS type if it is already formatted. But I can see that APFS container/volume have another disk number (i.e /dev/disk10) in compare with for my raw disk.

Yes and keep in mind that this is just one of many edge cases. IMHO, using the the content/structure of disk name to infer relationships between disks/volumes is a dangerous mistake- it works "well enough" to function under basic testing, but it will fail catastrophically. Disk paths should all treated as opaque values* which are retrieved through API, NOT strings which can be interpreted or built.

*Note that the same exists between volume names and volume mount paths. That is, assuming that the volume named "foo" is mounted at "/Volumes/foo" is also dangerous programmatic error.

Making this point explicitly:

Or just add +1 to raw disk number for getting APFS container/volume disk device number?

Never do this. The issue isn't that it won't work, it's that it WILL work much of the time... until it doesn't and you destroy user data.

Is there a guaranteed way to match raw disk number to underlying APFS container/volume disk number? May be some API?

Yes, there are actually several options here depending on what you're actually trying to do and what you're "starting" with.

In terms of understanding what's actually "going on", I would recommend downloading and using IORegistryExplorer. You'll find it inside "Additional Tools for Xcode" from our "More Downloads" section. IORegistryExplorer is basically the GUI equivalent of ioreg and, in my experience, it makes it FAR easier to visualize and understand exactly how the IORegistry actually works. Two quick notes on it:

  1. The Additional Tools download is generated as part of our build process (this is why it's versioned), but most of the tools inside it don't actually change all that often, particularly a fundamental tool like IORegistryExplorer. You don't really need to worry about keeping it "up to date".

  2. IORegistryExplorer maintains a "live" view of the current registry state, using "green" for new objects and "red" for old/dead objects. This can be really handy, but it can also slow things down and/or cause the interface to reset while you're trying to use this. If that's getting in the way of your work, you can use "Save" to generate a "snapshot" of the current state then open that file to view the static state. That avoids the performance issues (just close the "live" window) and also means you can share that file with other people or save it for later review. If you write to DTS with driver issue, the first thing I will as for is a IORegistryExplorer snapshot and, no, I won't take an ioreg text dump instead.

In any case, once you've got a snapshot open, search for "IOMedia". Note that I'd actually recommend that you start by looking at a simple HFS+ or FAT device, not APFS, as they'll make the common architecture more obvious. Part of APFS's implementation exists in this IOKit layer, which means it uses custom subclasses like AppleAPFSVolumeBSDClient/AppleAPFSMedia and has more multilayered layout than other volumes.

In any case, IOMedia is the base class used by the logical device layer, which is what actually:

  • Provides the generic block level I/O abstraction.

  • Interprets the partition map and creates separate I/O channels for different partions.

  • Creates the actual dev nodes your interacting with.

...so those IOKit objects represent the "truth" about the relationship between device nodes and physical media. On that last point, you'll note that there are "IOMediaBSDClient"* attached to the various IOMedia objects. Those clients are what actually create and manage every disk node the system creates.

When you select an IOMedia object, there are a few different properties to pay attention to:

  1. "BSD Name" (string)-> the disk name that's in "/dev".

  2. "Leaf" (boolean)-> Whether or not this disk has child slices.

  3. "Whole" (boolean)-> Whether or not this node present as an "entire" disk (diskX) or a "slice" (diskXsX).

Note that "Whole" can be misleading, as it's what creates this behavior:

But I can see that APFS container/volume have another disk number (i.e /dev/disk10) in compare with for my raw disk.

...as node can choose to present themselves as whole nodes, even through they're actually slices of a different parent node. However, you can use IOKit APIs to see what's actually going on in IOKit. With all that background, let me return to here:

May be some API?

The biggest questions here are:

  1. Are you starting from a volume, slice (diskXsX), or whole device (diskX)?

  2. What are you actually trying to do/use this information "for"?

The two key frameworks here are IOKit (which handles "devices") and DiskArbitration (which handles "volumes") and you answer to the first question will really device which of the APIs you "start" with.

The second question is important because it has a huge effect on the details of your implementation. For example, determining "what volumes are on this device" is relatively straightforward, as you can:

  1. Find the IOKit node for that device
  2. Find it's child leaf nodes
  3. Use DiskArb to map those nodes to volumes.

However, determining "what devices(s) is this volume on" is significantly trickier. The problem here is that IOKit objects can attach to more than one parent (this is how RAID works) but the standard IOKit iteration APIs don't really account for that case. It's been many years since I last wrote code for this, but my recollection is that the right way to do this is:

  1. Get a DADisk for the volume
  2. Get the IOMedia object from the DADisk
  3. Use IOKit to get all Whole IOMedia objects
  4. Iterate the child nodes of all THOSE objects, looking for "your" media node (2).

However, there is a false postive issue here because APFS (and other edge cases, including RAID) can insert an "extra" whole node. So, while doing #4 you also need to filter out any intermediate whole node to remove that edge case.

Finally, the other issue here is that your larger goal will change what API you "anchor" your app around. Historically, most of the APIs I worked on were "device" oriented (for example, CD burning), so I tended to use IOKit to discover/track device, then move "up" the stack to get user relevant information (like what volume was on a device). However, a volume oriented app (for example, backup apps) might be better off tracking volumes using DiskArb.

In any case, tell me more about what you're trying to do and I'll provide more specific guidance.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I have a raw device number, as an input, for external usb/thunderbolt device (i.e. for /dev/disk9 - it is 9). And I want to format it to APFS or read/confirm APFS FS type if it is already formatted. But I can see that APFS container/volume have another disk number (i.e /dev/disk10) in compare with for my raw disk.

Yes and keep in mind that this is just one of many edge cases. IMHO, using the the content/structure of disk name to infer relationships between disks/volumes is a dangerous mistake- it works "well enough" to function under basic testing, but it will fail catastrophically. Disk paths should all treated as opaque values* which are retrieved through API, NOT strings which can be interpreted or built.

*Note that the same exists between volume names and volume mount paths. That is, assuming that the volume named "foo" is mounted at "/Volumes/foo" is also dangerous programmatic error.

Making this point explicitly:

Or just add +1 to raw disk number for getting APFS container/volume disk device number?

Never do this. The issue isn't that it won't work, it's that it WILL work much of the time... until it doesn't and you destroy user data.

Is there a guaranteed way to match raw disk number to underlying APFS container/volume disk number? May be some API?

Yes, there are actually several options here depending on what you're actually trying to do and what you're "starting" with.

In terms of understanding what's actually "going on", I would recommend downloading and using IORegistryExplorer. You'll find it inside "Additional Tools for Xcode" from our "More Downloads" section. IORegistryExplorer is basically the GUI equivalent of ioreg and, in my experience, it makes it FAR easier to visualize and understand exactly how the IORegistry actually works. Two quick notes on it:

  1. The Additional Tools download is generated as part of our build process (this is why it's versioned), but most of the tools inside it don't actually change all that often, particularly a fundamental tool like IORegistryExplorer. You don't really need to worry about keeping it "up to date".

  2. IORegistryExplorer maintains a "live" view of the current registry state, using "green" for new objects and "red" for old/dead objects. This can be really handy, but it can also slow things down and/or cause the interface to reset while you're trying to use this. If that's getting in the way of your work, you can use "Save" to generate a "snapshot" of the current state then open that file to view the static state. That avoids the performance issues (just close the "live" window) and also means you can share that file with other people or save it for later review. If you write to DTS with driver issue, the first thing I will as for is a IORegistryExplorer snapshot and, no, I won't take an ioreg text dump instead.

In any case, once you've got a snapshot open, search for "IOMedia". Note that I'd actually recommend that you start by looking at a simple HFS+ or FAT device, not APFS, as they'll make the common architecture more obvious. Part of APFS's implementation exists in this IOKit layer, which means it uses custom subclasses like AppleAPFSVolumeBSDClient/AppleAPFSMedia and has more multilayered layout than other volumes.

In any case, IOMedia is the base class used by the logical device layer, which is what actually:

  • Provides the generic block level I/O abstraction.

  • Interprets the partition map and creates separate I/O channels for different partions.

  • Creates the actual dev nodes your interacting with.

...so those IOKit objects represent the "truth" about the relationship between device nodes and physical media. On that last point, you'll note that there are "IOMediaBSDClient"* attached to the various IOMedia objects. Those clients are what actually create and manage every disk node the system creates.

When you select an IOMedia object, there are a few different properties to pay attention to:

  1. "BSD Name" (string)-> the disk name that's in "/dev".

  2. "Leaf" (boolean)-> Whether or not this disk has child slices.

  3. "Whole" (boolean)-> Whether or not this node present as an "entire" disk (diskX) or a "slice" (diskXsX).

Note that "Whole" can be misleading, as it's what creates this behavior:

But I can see that APFS container/volume have another disk number (i.e /dev/disk10) in compare with for my raw disk.

...as node can choose to present themselves as whole nodes, even through they're actually slices of a different parent node. However, you can use IOKit APIs to see what's actually going on in IOKit. With all that background, let me return to here:

May be some API?

The biggest questions here are:

  1. Are you starting from a volume, slice (diskXsX), or whole device (diskX)?

  2. What are you actually trying to do/use this information "for"?

The two key frameworks here are IOKit (which handles "devices") and DiskArbitration (which handles "volumes") and you answer to the first question will really device which of the APIs you "start" with.

The second question is important because it has a huge effect on the details of your implementation. For example, determining "what volumes are on this device" is relatively straightforward, as you can:

  1. Find the IOKit node for that device
  2. Find it's child leaf nodes
  3. Use DiskArb to map those nodes to volumes.

However, determining "what devices(s) is this volume on" is significantly trickier. The problem here is that IOKit objects can attach to more than one parent (this is how RAID works) but the standard IOKit iteration APIs don't really account for that case. It's been many years since I last wrote code for this, but my recollection is that the right way to do this is:

  1. Get a DADisk for the volume
  2. Get the IOMedia object from the DADisk
  3. Use IOKit to get all Whole IOMedia objects
  4. Iterate the child nodes of all THOSE objects, looking for "your" media node (2).

However, there is a false postive issue here because APFS (and other edge cases, including RAID) can insert an "extra" whole node. So, while doing #4 you also need to filter out any intermediate whole node to remove that edge case.

Finally, the other issue here is that your larger goal will change what API you "anchor" your app around. Historically, most of the APIs I worked on were "device" oriented (for example, CD burning), so I tended to use IOKit to discover/track device, then move "up" the stack to get user relevant information (like what volume was on a device). However, a volume oriented app (for example, backup apps) might be better off tracking volumes using DiskArb.

In any case, tell me more about what you're trying to do and I'll provide more specific guidance.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Matching between raw and APFS disk number
 
 
Q