Read Me About EmptyFS.txt

Read Me About EmptyFS
=====================
1.0
 
EmptyFS is a very simple VFS plug-in.  EmptyFS volumes are completely empty (except for the root directory).  The purpose of this sample is to create the minimal VFS plug-in that builds and runs.  You can use EmptyFS to explore VFS behaviour, or as a template for your own VFS plug-in.
 
EmptyFS requires Mac OS X 10.4 or later.  Mac OS X 10.4 introduced a new programming model for VFS plug-ins, and EmptyFS is tied to this model.  It would be tricky to port this code to earlier versions of Mac OS X.
 
Packing List
------------
The sample contains the following items:
 
o Read Me About EmptyFS.txt -- This file.
o EmptyFS.xcodeproj -- An Xcode 2.4 project for the sample.
o EmptyFS.c -- Source code for the kernel extension.
o Info.plist -- A property list file for the kernel extension.
o MountEmptyFS.c -- Source code for the mount tool.
o EmptyFSMountArgs.h -- Definitions shared between the kernel extension and the mount tool.
o build -- A directory contain pre-built binaries.
 
Using the Sample
----------------
To use the sample, you must first decide what device node to mount on.  The simplest approach is to create a disk image.
 
$ hdiutil create -size 1MB -partitionType Apple_DTS_EmptyFS test
[...]
created: /Users/quinn/test.dmg
 
You then attach the disk image with the "-nomount" flag (so that the system doesn't automatically mount the volumes on the image).
 
$ hdiutil attach -nomount test.dmg 
/dev/disk1              Apple_partition_scheme         
/dev/disk1s1            Apple_partition_map            
/dev/disk1s2            Apple_DTS_EmptyFS                      
 
The last line that this prints (in this example it's "/dev/disk1s2", but it may be different on your system) is a suitable device node for this test.  
 
Now install the EmptyFS KEXT.  The following assumes that you downloaded the sample code archive to your desktop.
 
$ sudo cp -R ~/Desktop/EmptyFS/build/Debug/EmptyFS.kext /
$ sudo chown -R root:wheel /EmptyFS.kext
$ sudo kextload -t /EmptyFS.kext
kextload: extension /EmptyFS.kext appears to be valid
kextload: /EmptyFS.kext loaded successfully
 
At this point you can create a mount point and mount the file system.
 
$ mkdir MountPoint
$ ~/Desktop/EmptyFS/build/Debug/mount_EmptyFS /dev/disk1s2 MountPoint
 
You must change "/dev/disk1s2" to match the string printed by "hdiutil" earlier.
 
The "mount" command will now show the file system as mounted.
 
$ mount | grep MountPoint
/dev/disk1s2 on /Users/quinn/MountPoint (local, nodev, noexec, nosuid, read-only, mounted by quinn)
 
You can list its root directory.
 
$ ls -alh MountPoint
total 8
dr-xr-xr-x    2 quinn  quinn  528B Jan  1  1970 .
drwxr-xr-x   30 quinn  quinn  1020B Jul  4 14:13 ..
 
You can also use the "stat" command to get detailed information about the EmptyFS root directory.
 
$ stat MountPoint
234881032 2 dr-xr-xr-x 2 quinn quinn 0 528 "Jan  1 01:00:00 1970" "Jan  1 01:00:00 1970" "Jan  1 01:00:00 1970" 4096 8 0 MountPoint
 
Finally, you can unmount the file system and then unload the KEXT.
 
$ umount MountPoint
$ sudo kextunload /EmptyFS.kext 
kextunload: unload kext /EmptyFS.kext succeeded
 
Building the Sample
-------------------
The sample was built using Xcode 2.4 on Mac OS X 10.4.7.  You should be able to just open the project, select the "All" target, and choose Build from the Build menu.  This will build the "EmptyFS.kext" kernel extension and the "mount_EmptyFS" command line tool, both in the "Build" directory.
 
Notes
-----
The source code has extensive comments that I won't repeat here.  If you want information about how the code works, you should start by reading those comments.
 
To keep things as simple as possible, this file system has been designed and tested at the BSD level only.  This has a couple of consequences:
 
o I've made no attempt to support File Manager in this project.
 
o Similarly, I have not wrapped the KEXT in a bundle that's suitable for installation in "/System/Library/Filesystems".
 
Thus, you should not expect EmptyFS volumes to automatically show up on the desktop.
 
Build settings are tricky for any KEXT.  I have used a number of interesting tricks that you might find interesting.
 
o As with all my projects, I put the bulk of my build settings in the project build settings panel, from where they are inherited by all targets.
 
o I preprocess the KEXT's "Info.plist" file by enabling the INFOPLIST_PREPROCESS setting in the "KEXT" target.  I do this for two reasons:
 
  - It allows the Info.plist file to pick up compile-time variables (KEXT_BUNDLE_ID and KEXT_VERSION) from the INFOPLIST_PREPROCESSOR_DEFINITIONS build setting, which in turn picks them up from MODULE_NAME and MODULE_VERSION build settings.  Thus, you can change these settings in one place (the target build settings panel) and they propagate to all of the relevant places.
 
  - It allows me to conditionally express a dependency on the "com.apple.kpi.unsupported" KPI.  I need this KPI in my debug build (for lck_mtx_assert), but I don't want my release build to depend on it.
 
o In the release build, I strip all non-exported symbols using the STRIP_STYLE build setting.  This is very important for KEXTs because the kernel is a single flat namespace.
 
Vnode Locking and Reference Counts
----------------------------------
VFS handles the locking and reference counting needed for the structures it manages (mount points, vnodes, and so on).  Your VFS plug-in can decide how to lock the structures it manages; VFS does not impose locking requirements on your plug-in.  You can use mutexes or read/write locks explicitly, so you can get the maximum concurrency, or let VFS do most of your locking for you, which makes your code easier.
 
If you let VFS do locking for you, VFS will acquire a "funnel" (a type of mutex) before calling your VFS plug-in's entry points, and release it after your code returns.  The funnel is released and reacquired when your VFS plug-in blocks, such as when allocating memory.  The funnel is not released due to preemption.  The funnel is sufficient to protect your global variables as long as your code does not block (for example, insertions and removals from linked lists or hashes don't block, and are thus protected by the funnel).
 
    Note
    The funnel is the same mechanism that was used in older versions of 
    Mac OS X, but in Mac OS X 10.4 and later is used only for VFS plug-ins 
    that request it.
 
    IMPORTANT
    Use of the VFS funnel will be deprecated in the next major release of 
    Mac OS X.  If you're creating a new VFS plug-in, we recommend that 
    you design your plug-in to not need funnel support.
 
If you let VFS do the locking for you, it will also acquire a per-vnode mutex for every vnode passed in to your VFS plug-in, and release the mutex when your code returns to VFS.  You can use this to protect your per-vnode data structures (what we call an FSNode).
 
Even if you use VFS's funnel and per-vnode locks, your VFS plug-in will still need to provide explicit protection for shared structures that are accessed across calls that may block.  In some cases (such as your FSNode hash table), you can rely on the funnel to set and clear flags, and use msleep and wakeup to manage synchronization.  In other cases, you may want to use your own mutex or read/write lock.  For example, if you maintain a per-volume data structure to record which blocks are allocated on disk, you could protect that data structure with a mutex.  Alternatively, you could manipulate the data structure within a buffer cache entry, and exploit the synchronization guarantees provided by the buffer cache.
 
Whether or not you use VFS-provided locking, VFS makes some guarantees which will simplify your VFS plug-in.  For example, when a volume is being unmounted, VFS waits for previous operations on the volume (and its vnodes) to complete before starting the unmount, and will not let any other new operations on that volume to begin until the unmount completes or returns an error.  Also, vnodes are not reused by other VFS plug-ins until all references to the vnode are released.
 
VFS keeps track of three kinds of references to vnodes, each with different meanings.  VFS uses these references to determine whether a vnode is "busy" or "in use".
 
The first type of reference is an "I/O reference", also known as an "io_count reference".  An I/O reference indicates that the vnode is actively involved in handling some file system operation.  You acquire an I/O reference by calling vnode_get or vnode_getwithvid, and release the reference using vnode_put.  The vnode_create routine creates a vnode with a single I/O reference.
 
VFS uses I/O references to determine when a vnode is "busy".  A volume cannot be unmounted while any of its vnodes has an I/O reference.   An I/O reference should never be held beyond the duration of a single file system operation.  
 
All the vnode ops already come from the VFS layer with an I/O reference. The cache lookup routine returns the vnode with I/O reference. A vnode retrieved from the file system hash needs to be returned with an I/O reference. 
 
The second type of reference is a "use count reference".  A use count reference represents a long-term interest in a vnode.  You acquire a use count reference by calling vnode_ref, and release the reference using vnode_rele.  You should have an I/O reference on the vnode to acquire or release the use count reference.  VFS uses use count references to determine when a vnode is "in use".  A use count reference may (and often is) held beyond the duration of a single file system operation.  For example, if your private mount point structure contains a pointer to a device vnode, you should acquire a use count reference on that vnode when you store it in the mount point, and release the reference when you change that mount point field or deallocate the mount point structure.
 
Both I/O and use count references are counted.  That is, the number of releases must match the number of acquisitions.  If the reference has been acquired more times than it has been released, then the vnode is considered "busy" or "in use".  The counts are not allowed to be negative; that is you cannot release first in order to balance a future acquisition.
 
Both I/O and use count references prevent a vnode from being reused by another file or file system.  That is, you will not receive a VNOPInactive or VNOPReclaim call for a vnode until it has no I/O or use count references.  A volume cannot be unmounted if any of its vnodes have I/O or use count references.
 
The third type of reference is an FS reference (not to be confused with File Manager's FSRef).  An FS reference is weak in that it does not prevent a vnode from being reused.  It indicates that the file system has stored a pointer to the vnode, but can handle the vnode being reused.  An FS reference is acquired with vnode_addfsref, and released with vnode_removefsref.  An FS reference is typically acquired when a vnode (actually, its corresponding FSNode) is inserted into a hash table, and released when the vnode removed from the table (typically as the result of a VNOPReclaim operation).  FS references are Boolean, not counted, so a single release balances any number of acquisitions.
 
    Important
    Don't forget to call vnode_removefsref from your VNOPReclaim routine.  
    If you forget, the system will panic later, when that vnode is reused for 
    some other file.  When testing, you can use "find -x / >& /dev/null" as a 
    handy way to cycle through lots of vnodes, forcing any inactive (not busy, 
    not used) vnode to be reused.
 
Credits and Version History
---------------------------
If you find any problems with this sample, mail <dts@apple.com> and I'll try to fix them up.
 
1.0 (Oct 2006) was the first shipping version.
 
Share and Enjoy.
 
Apple Developer Technical Support
Core OS/Hardware
 
31 Oct 2006
 
$Log: Read\040Me\040About\040EmptyFS.txt,v $
Revision 1.9  2006/10/31 16:28:47  eskimo1
Updated the "Vnode Locking and Reference Counts" section based on review feedback.
 
Revision 1.8  2006/10/20 23:08:08  eskimo1
Corrected partition type in hdiutil attach command results.
 
Revision 1.7  2006/10/17 13:57:08  eskimo1
Set the partition type when creating the disk image.
 
Revision 1.6  2006/10/10 21:13:51  eskimo1
Change group name.
 
Revision 1.5  2006/10/09 14:08:21  eskimo1
Noted the deprecation of VFS funnels, plus some copy edit changes.
 
Revision 1.4  2006/08/01 00:14:06  eskimo1
Corrected the disk device path in one of the example command lines.
 
Revision 1.3  2006/07/25 16:27:10  eskimo1
Rolled in changes based on experience from MFSLives.  Almost all of these were updated comments.
 
Revision 1.2  2006/07/04 14:10:54  eskimo1
Correct the paths in the example.
 
Revision 1.1  2006/07/04 14:04:06         
First checked in.