Next Previous

Retired Document

Important: This sample code may not represent best practices for current development. The project may use deprecated symbols and illustrate technologies and techniques that are no longer recommended.

EmptyFS.c

/*

    File:       EmptyFS.c

    Contains:   A basic VFS plug-in example.

    Written by: DTS

    Copyright:  Copyright (c) 2006 by Apple Computer, Inc., All Rights Reserved.

    Disclaimer: IMPORTANT:  This Apple software is supplied to you by Apple Computer, Inc.

                ("Apple") in consideration of your agreement to the following terms, and your

                use, installation, modification or redistribution of this Apple software

                constitutes acceptance of these terms.  If you do not agree with these terms,

                please do not use, install, modify or redistribute this Apple software.

                In consideration of your agreement to abide by the following terms, and subject

                to these terms, Apple grants you a personal, non-exclusive license, under Apple's

                copyrights in this original Apple software (the "Apple Software"), to use,

                reproduce, modify and redistribute the Apple Software, with or without

                modifications, in source and/or binary forms; provided that if you redistribute

                the Apple Software in its entirety and without modifications, you must retain

                this notice and the following text and disclaimers in all such redistributions of

                the Apple Software.  Neither the name, trademarks, service marks or logos of

                Apple Computer, Inc. may be used to endorse or promote products derived from the

                Apple Software without specific prior written permission from Apple.  Except as

                expressly stated in this notice, no other rights or licenses, express or implied,

                are granted by Apple herein, including but not limited to any patent rights that

                may be infringed by your derivative works or by other works in which the Apple

                Software may be incorporated.

                The Apple Software is provided by Apple on an "AS IS" basis.  APPLE MAKES NO

                WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED

                WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR

                PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND OPERATION ALONE OR IN

                COMBINATION WITH YOUR PRODUCTS.

                IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL OR

                CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE

                GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)

                ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION, MODIFICATION AND/OR DISTRIBUTION

                OF THE APPLE SOFTWARE, HOWEVER CAUSED AND WHETHER UNDER THEORY OF CONTRACT, TORT

                (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN

                ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    Change History (most recent first):

$Log: EmptyFS.c,v $

Revision 1.4  2006/10/31 16:27:46  eskimo1

Updated some comments based on review feedback (and corrected the AssertKnownFlags in VFSOPStart).

Revision 1.3  2006/07/25 16:38:06  eskimo1

Disable all name caching.  Added uiomove_atomic that checks to see whether we have enough space to copy out the entire dirent.

Revision 1.2  2006/07/25 16:27:07  eskimo1

Rolled in changes based on experience from MFSLives.  Almost all of these were updated comments.

Revision 1.1  2006/07/04 14:03:52  eskimo1

First checked in.

*/

/////////////////////////////////////////////////////////////////////

#include "EmptyFSMountArgs.h"

#include <kern/assert.h>

#include <libkern/libkern.h>

#include <libkern/OSMalloc.h>

#include <libkern/locks.h>

#include <mach/mach_types.h>

#include <sys/errno.h>

#include <sys/mount.h>

#include <sys/vnode.h>

#include <sys/vnode_if.h>

#include <sys/kernel_types.h>

#include <sys/stat.h>

#include <sys/dirent.h>

#include <sys/proc.h>

#include <sys/fcntl.h>

/////////////////////////////////////////////////////////////////////

#pragma mark ***** Source Code Notes

/*

    Bit Fields

    ----------

    In places where I initialise a bit field, I include both the active bits

    and the inactive bits (commented out).  This lets you quickly see all of

    the options that are available and the options that I've specifically enabled.

    Terminology

    -----------

    Each volume is made up of a set of file system objects (fsobjs).  These objects

    are stored on disk (or in some other way, such as across the network).  To speed

    things up, the system caches information about these file system objects in

    memory.  The objects in this cache are called vnodes.  The cache is managed by

    the VFS layer and the VFS plug-in, working in concert.

    This cache is /not/ the disk cache (in the traditional sense of the phrase).

    A disk cache typically caches the contents of blocks on the disk.  Here we're

    referring to a cache of information about the file system objects on the volume.

    Mac OS X does have a disk cache (called the Unified Buffer Cache, UBC), and this

    example interacts with it when it needs to read directory blocks (using the

    buf_meta_bread call) and when it reads files (using the cluster_read and

    cluster_pagein calls).

    A vnode is a virtual representation of a file system object.  It's virtual in

    the sense that it has no information about the concrete implementation of the

    object on disk (or across the network).  Rather, it's the handle which the

    higher levels of the system use to learn about and manipulate a given file

    system object.  The only concrete information about the file system object

    that stored in the vnode is a reference to the corresponding FSNode.

    An FSNode is the in-memory representation of a file system object.  An FSNode

    is managed by the VFS plug-in, and contains all of the concrete information

    needed to manage that file system object.  For example, on HFS Plus the FSNode

    would store the CNID of the file system object.

    We don't use "inode" at all, for two reasons:

      o Traditionally, the term "inode" has been used to describe both the

        on-disk representation of a file system object /and/ the

        in-memory representation of that object (if it's being cached in memory).

        That's just confusing (-:

      o The term "inode" implies a certain style of on-disk organisation, which is

        not universally applicable (for an obvious example, consider a network

        file system), and is certainly not applicable to MFS.

    Traditionally there is a one-to-one correspondence between vnodes and FSNodes.

    However, this not true in the presence of multi-fork files, where there is

    one vnode for each fork but all of these refer to the same FSNode.

    FSNode Hash

    -----------

    It's important to realise that the vnode cache is managed globally by the

    VFS layer.  The VFS plug-in is expected to following along with decisions

    made by the VFS layer.  However, vnodes are created by the VFS plug-ins,

    as they respond to incoming requests.

    The most common situation where a VFS plug-in needs to create a vnode is

    in VNOPLookup.  In this case, the plug-in has information about the file

    system object in question (in this example, we have the file number) and

    needs to create a vnode for to return as the result of the lookup.

    The critical point is that the VFS plug-in MUST NOT create two vnodes

    for the same file.  Therefore the plug-in must maintain some data structure

    that:

      o can be accessed quickly based on the information in the file system

        object's directory entry (that is, the file number)

      o tells the VFS plug-in which file system objects are currently in memory

      o can return the vnode, if any, associated with that FSNode

    This is typically done using a hash table that indexes all of the FSNodes.

    This is keyed by the file system object's raw device number (dev_t) and

    inode number (file number in the case of MFS).  Getting the mechanics of

    this table right is the most difficult part of implementing a VFS plug-in.

    In the case of EmptyFS, there can only be one possible vnode (the root

    vnode) and thus we don't need a hash table.  Rather, we store information

    about the root vnode in the mount point itself.  Also, we don't actually

    need an FSNode data structure, because we don't need any state for our

    file system objects.

*/

/////////////////////////////////////////////////////////////////////

#pragma mark ***** More Asserts

// We use the system assert macro (from <kern/assert.h>) for standard asserts.

// In some cases we also want to assert that an incoming 'flags' parameter

// has only the bits that we know about set.  In this case we use the

// AssertKnownFlags macro.  As getting an unknown flag is more of a warning

// than an error, we just print a message and continue execution.

#if MACH_ASSERT

    static void AssertKnownFlagsCore(

        uint64_t        flags,

        uint64_t        knownFlags,

        boolean_t *     havePrintedPtr,

        const char *    fileStr,

        int             lineNumber,

        const char *    flagsStr,

        const char *    knownFlagsStr

        // Core implementation of AssertKnownFlags.

        // Check to see if we have any unknown flags.

        if ( (flags & ~knownFlags) != 0 ) {

            // If so, have we already printed a warning.

            if ( (havePrintedPtr == NULL) || ! *havePrintedPtr ) {

                // If not, print it.

                printf("%s:%d: AssertKnownFlags(%s, %s) saw unknown flags 0x%llx.\n",

                    fileStr,

                    lineNumber,

                    flagsStr,

                    knownFlagsStr,

                    flags & ~knownFlags

);

            // And record that we did.

            if (havePrintedPtr != NULL) {

                *havePrintedPtr = TRUE;

    // In AssertKnownFlags macro, flags is the incoming flags and

    // knownFlags is the set of all flags that we knew about when we

    // wrote the code.

    #define AssertKnownFlags(flags, knownFlags) \

        do {                                    \

            static boolean_t sHavePrinted;      \

            AssertKnownFlagsCore((flags), (knownFlags), &sHavePrinted, __FILE__, __LINE__, # flags, # knownFlags); \

        } while (0)

#else

    #define AssertKnownFlags(flags, knownFlags) do { } while (0)

#endif

/////////////////////////////////////////////////////////////////////

#pragma mark ***** Error Conversion

static errno_t ErrnoFromKernReturn(kern_return_t kernErr)

    // Maps a kern_return_t-style error into an errno_t-style error.

    errno_t err;

    if (kernErr == KERN_SUCCESS) {

        err = 0;

    } else {

        err = EINVAL;

    return err;

static kern_return_t KernReturnFromErrno(errno_t err)

    // Maps an errno_t-style error into a kern_return_t-style error.

    kern_return_t kernErr;

    if (err == 0) {

        kernErr = KERN_SUCCESS;

    } else {

        kernErr = KERN_FAILURE;

    return err;

/////////////////////////////////////////////////////////////////////

#pragma mark ***** Memory and Locks

// gOSMallocTag is used for all of our allocations.

static OSMallocTag  gOSMallocTag = NULL;

// gLockGroup is used for all of our locks.

static lck_grp_t *  gLockGroup = NULL;

static void TermMemoryAndLocks(void)

    // Disposes of gOSMallocTag and gLockGroup.

    if (gLockGroup != NULL) {

        lck_grp_free(gLockGroup);

        gLockGroup = NULL;

    if (gOSMallocTag != NULL) {

        OSMalloc_Tagfree(gOSMallocTag);

        gOSMallocTag = NULL;

static kern_return_t InitMemoryAndLocks(void)

    // Initialises of gOSMallocTag and gLockGroup.

    kern_return_t   err;

    err = KERN_SUCCESS;

    gOSMallocTag = OSMalloc_Tagalloc("com.apple.dts.kext.EmptyFS", OSMT_DEFAULT);

    if (gOSMallocTag == NULL) {

        err = KERN_FAILURE;

    if (err == KERN_SUCCESS) {

        gLockGroup = lck_grp_alloc_init("com.apple.dts.kext.EmptyFS", LCK_GRP_ATTR_NULL);

        if (gLockGroup == NULL) {

            err = KERN_FAILURE;

    // Clean up.

    if (err != KERN_SUCCESS) {

        TermMemoryAndLocks();

    assert( (err == KERN_SUCCESS) == (gOSMallocTag != NULL) );

    assert( (err == KERN_SUCCESS) == (gLockGroup   != NULL) );

    return err;

/////////////////////////////////////////////////////////////////////

#pragma mark ***** Core Data Structures

// gVNodeOperations is set up when we register the VFS plug-in with vfs_fsadd.

// It holds a pointer to the array of vnode operation functions for this

// VFS plug-in.  We have to declare it early in this file because it's referenced

// by the code that creates vnodes.

static errno_t (**gVNodeOperations)(void *);

// EmptyFSMount holds the file system specific data that we need per mount point.

// We attach this to the kernel mount_t by calling vfs_setfsprivate in VFSOPMount.

// There is no reference count on this structure; it lives and dies along with the

// corresponding mount_t.

enum {

    kEmptyFSMountMagic    = 'MtMn',

    kEmptyFSMountBadMagic = 'M!Mn'

};

struct EmptyFSMount {

    uint32_t        fMagic;             // [1] must be kEmptyFSMountMagic

    mount_t         fMountPoint;        // [1] back pointer to the mount_t

    uint32_t        fDebugLevel;        // [1] [3] debug level from mount arguments

    dev_t           fBlockRDevNum;      // [1] raw dev_t of the device we're mounted on

    vnode_t         fBlockDevVNode;     // [1] a vnode for the above; we have a use count reference on this

    char            fVolumeName[30];    // [1] volume name (UTF-8)

    struct vfs_attr fAttr;              // [1] pre-calculate volume attributes

    lck_mtx_t *     fRootMutex;         // [1] protects following fields

    boolean_t       fRootAttaching;     // [2] true if someone is attaching a root vnode

    boolean_t       fRootWaiting;       // [2] true if someone is waiting for such an attach to complete

    vnode_t         fRootVNode;         // [2] the root vnode; we hold /no/ proper references to this,

                                        //     and must reconfirm its existance each time

};

typedef struct EmptyFSMount EmptyFSMount;

// Root VNode Notes

// ----------------

// In a typical VFS plug-in, the root vnode is accessed via the hash layer, exactly

// like any other vnode.  In this trivial file system, I haven't implemented a hash

// layer (simply because I don't need it), thus I store the root vnode information

// in the mount point.

// Other Notes

// -----------

// [1] This field is immutable.  That is, it's set up as part of the initialisation

//     process, and is not modified after that.  Thus, it doesn't need to be

//     protected from concurrent access.

//

// [2] This field is protected by the fRootMutex lock.

//

// [3] fDebugLevel isn't really used.  I've included it for two reasons:

//     a) if you use EmptyFS as a template for your own VFS plug-in, it will be useful

//        to have a handy debug switch

//     b) it's a good example of how to pass information from your mount tool to your

//        KEXT

static EmptyFSMount *   EmptyFSMountFromMount(mount_t mp)

    // Gets the EmptyFSMount from a mount_t.

    EmptyFSMount *  result;

    assert(mp != NULL);

    result = vfs_fsprivate(mp);

    assert(result != NULL);

    assert(result->fMagic == kEmptyFSMountMagic);

    assert(result->fMountPoint == mp);

    return result;

static void EmptyFSMountInitGetAttrListGoop(EmptyFSMount *mtmp)

    // Initialises the f_capabilities and f_attributes fields of the

    // fAttr field of the EmptyFSMount with the appropriate static values.

    // This is in a separate routine because it's so big; I didn't want

    // to confuse EmptyFSInitAttr with all of this stuff.

    mtmp->fAttr.f_capabilities.capabilities[VOL_CAPABILITIES_FORMAT]     = 0

//      | VOL_CAP_FMT_PERSISTENTOBJECTIDS

//      | VOL_CAP_FMT_SYMBOLICLINKS

//      | VOL_CAP_FMT_HARDLINKS

//      | VOL_CAP_FMT_JOURNAL

//      | VOL_CAP_FMT_JOURNAL_ACTIVE

        | VOL_CAP_FMT_NO_ROOT_TIMES

//      | VOL_CAP_FMT_SPARSE_FILES

//      | VOL_CAP_FMT_ZERO_RUNS

        | VOL_CAP_FMT_CASE_SENSITIVE

        | VOL_CAP_FMT_CASE_PRESERVING

        | VOL_CAP_FMT_FAST_STATFS

        | VOL_CAP_FMT_2TB_FILESIZE

    mtmp->fAttr.f_capabilities.valid[VOL_CAPABILITIES_FORMAT]            = 0

        | VOL_CAP_FMT_PERSISTENTOBJECTIDS

        | VOL_CAP_FMT_SYMBOLICLINKS

        | VOL_CAP_FMT_HARDLINKS

        | VOL_CAP_FMT_JOURNAL

        | VOL_CAP_FMT_JOURNAL_ACTIVE

        | VOL_CAP_FMT_NO_ROOT_TIMES

        | VOL_CAP_FMT_SPARSE_FILES

        | VOL_CAP_FMT_ZERO_RUNS

        | VOL_CAP_FMT_CASE_SENSITIVE

        | VOL_CAP_FMT_CASE_PRESERVING

        | VOL_CAP_FMT_FAST_STATFS

        | VOL_CAP_FMT_2TB_FILESIZE

    mtmp->fAttr.f_capabilities.capabilities[VOL_CAPABILITIES_INTERFACES] = 0

//      | VOL_CAP_INT_SEARCHFS

        | VOL_CAP_INT_ATTRLIST

//      | VOL_CAP_INT_NFSEXPORT

//      | VOL_CAP_INT_READDIRATTR

//      | VOL_CAP_INT_EXCHANGEDATA

//      | VOL_CAP_INT_COPYFILE

//      | VOL_CAP_INT_ALLOCATE

//      | VOL_CAP_INT_VOL_RENAME

//      | VOL_CAP_INT_ADVLOCK

//      | VOL_CAP_INT_FLOCK

//      | VOL_CAP_INT_EXTENDED_SECURITY

//      | VOL_CAP_INT_USERACCESS

    mtmp->fAttr.f_capabilities.valid[VOL_CAPABILITIES_INTERFACES]        = 0

        | VOL_CAP_INT_SEARCHFS

        | VOL_CAP_INT_ATTRLIST

        | VOL_CAP_INT_NFSEXPORT

        | VOL_CAP_INT_READDIRATTR

        | VOL_CAP_INT_EXCHANGEDATA

        | VOL_CAP_INT_COPYFILE

        | VOL_CAP_INT_ALLOCATE

        | VOL_CAP_INT_VOL_RENAME

        | VOL_CAP_INT_ADVLOCK

        | VOL_CAP_INT_FLOCK

        | VOL_CAP_INT_EXTENDED_SECURITY

        | VOL_CAP_INT_USERACCESS

    mtmp->fAttr.f_attributes.validattr.commonattr  = 0

        | ATTR_CMN_NAME

        | ATTR_CMN_DEVID

        | ATTR_CMN_FSID

        | ATTR_CMN_OBJTYPE

//      | ATTR_CMN_OBJTAG

        | ATTR_CMN_OBJID

//      | ATTR_CMN_OBJPERMANENTID

        | ATTR_CMN_PAROBJID

//      | ATTR_CMN_SCRIPT

        | ATTR_CMN_CRTIME

//      | ATTR_CMN_MODTIME

//      | ATTR_CMN_CHGTIME

//      | ATTR_CMN_ACCTIME

//      | ATTR_CMN_BKUPTIME

//      | ATTR_CMN_FNDRINFO

        | ATTR_CMN_OWNERID

        | ATTR_CMN_GRPID

        | ATTR_CMN_ACCESSMASK

        | ATTR_CMN_FLAGS

//      | ATTR_CMN_USERACCESS

//      | ATTR_CMN_EXTENDED_SECURITY

//      | ATTR_CMN_UUID

//      | ATTR_CMN_GRPUUID

    mtmp->fAttr.f_attributes.validattr.volattr     = 0

        | ATTR_VOL_FSTYPE

//      | ATTR_VOL_SIGNATURE

        | ATTR_VOL_SIZE

        | ATTR_VOL_SPACEFREE

        | ATTR_VOL_SPACEAVAIL

//      | ATTR_VOL_MINALLOCATION

//      | ATTR_VOL_ALLOCATIONCLUMP

        | ATTR_VOL_IOBLOCKSIZE

        | ATTR_VOL_OBJCOUNT

        | ATTR_VOL_FILECOUNT

        | ATTR_VOL_DIRCOUNT

        | ATTR_VOL_MAXOBJCOUNT

        | ATTR_VOL_MOUNTPOINT

        | ATTR_VOL_NAME

        | ATTR_VOL_MOUNTFLAGS

        | ATTR_VOL_MOUNTEDDEVICE

//      | ATTR_VOL_ENCODINGSUSED

        | ATTR_VOL_CAPABILITIES

        | ATTR_VOL_ATTRIBUTES

    mtmp->fAttr.f_attributes.validattr.dirattr     = 0

//      | ATTR_DIR_LINKCOUNT

//      | ATTR_DIR_ENTRYCOUNT

//      | ATTR_DIR_MOUNTSTATUS

    mtmp->fAttr.f_attributes.validattr.fileattr    = 0

//      | ATTR_FILE_LINKCOUNT

        | ATTR_FILE_TOTALSIZE

//      | ATTR_FILE_ALLOCSIZE

        | ATTR_FILE_IOBLOCKSIZE

//      | ATTR_FILE_DEVTYPE

//      | ATTR_FILE_FORKCOUNT

//      | ATTR_FILE_FORKLIST

        | ATTR_FILE_DATALENGTH

        | ATTR_FILE_DATAALLOCSIZE

//      | ATTR_FILE_RSRCLENGTH

//      | ATTR_FILE_RSRCALLOCSIZE

    mtmp->fAttr.f_attributes.validattr.forkattr    = 0;

    // All attributes that we do support, we support natively.

    mtmp->fAttr.f_attributes.nativeattr.commonattr = mtmp->fAttr.f_attributes.validattr.commonattr;

    mtmp->fAttr.f_attributes.nativeattr.volattr    = mtmp->fAttr.f_attributes.validattr.volattr;

    mtmp->fAttr.f_attributes.nativeattr.dirattr    = mtmp->fAttr.f_attributes.validattr.dirattr;

    mtmp->fAttr.f_attributes.nativeattr.fileattr   = mtmp->fAttr.f_attributes.validattr.fileattr;

    mtmp->fAttr.f_attributes.nativeattr.forkattr   = mtmp->fAttr.f_attributes.validattr.forkattr;

static void EmptyFSInitAttr(EmptyFSMount *mtmp)

    // Initialises the fAttr field of the EmptyFSMount with the appropriate

    // static values.  This is done at initialisation time, so we don't have

    // to worry about concurrency.

    mtmp->fAttr.f_objcount    = 1;

    mtmp->fAttr.f_filecount   = 0;

    mtmp->fAttr.f_dircount    = 1;

    mtmp->fAttr.f_maxobjcount = 1;

    mtmp->fAttr.f_bsize       = 4096;

    mtmp->fAttr.f_iosize      = 4096;

    mtmp->fAttr.f_blocks      = 1;

    mtmp->fAttr.f_bfree       = 0;

    mtmp->fAttr.f_bavail      = 0;

    mtmp->fAttr.f_bused       = 1;

    mtmp->fAttr.f_files       = 1;

    mtmp->fAttr.f_ffree       = 0;

    mtmp->fAttr.f_fsid.val[0] = mtmp->fBlockRDevNum;

    mtmp->fAttr.f_fsid.val[1] = vfs_typenum(mtmp->fMountPoint);

//  mtmp->fAttr.f_owner = xxx;

    EmptyFSMountInitGetAttrListGoop(mtmp);      // f_capabilities and f_attributes

    nanotime(&mtmp->fAttr.f_create_time);

//  mtmp->fAttr.f_modify_time = xxx;

//  mtmp->fAttr.f_access_time = xxx;

//  mtmp->fAttr.f_backup_time = xxx;

    mtmp->fAttr.f_fssubtype = 0;

    mtmp->fAttr.f_vol_name = mtmp->fVolumeName;

//  mtmp->fAttr.f_signature = xxx;

//  mtmp->fAttr.f_carbon_fsid = xxx;

static errno_t EmptyFSMountGetRootVNodeCreatingIfNecessary(EmptyFSMount *mtmp, vnode_t *vnPtr)

    // Returns the root vnode for the volume, creating it if necessary.  The resulting

    // vnode has a I/O reference count, which the caller is responsible for releasing

    // (using vnode_put) or passing along to its caller.

    errno_t         err;

    errno_t         junk;

    vnode_t         resultVN;

    uint32_t        vid;

    // Pre-conditions

    assert(mtmp != NULL);

    assert( vnPtr != NULL);

    assert(*vnPtr == NULL);

    // resultVN holds vnode we're going to return in *vnPtr.  If this ever goes non-NULL,

    // we're done.

    resultVN = NULL;

    // First lock the revelant fields of the mount point.

    lck_mtx_lock(mtmp->fRootMutex);

    do {

        // Loop invariants (-:

        assert(resultVN == NULL);       // no point looping if we already have a result

        // lck_mtx_assert is only available in the "com.apple.kpi.unsupported" KPI, so

        // we only use it in debug builds.  Our "Info.plist" file is preprocessed to

        // require the "com.apple.kpi.unsupported" KPI in this case.

        #if MACH_ASSSERT

            lck_mtx_assert(mtmp->fRootMutex, LCK_MTX_ASSERT_OWNED);

        #endif

        if (mtmp->fRootAttaching) {

            // If someone else is already trying to create the root vnode, wait for

            // them to get done.  Note that msleep will unlock and relock mtmp->fRootMutex,

            // so once it returns we have to loop and start again from scratch.

            mtmp->fRootWaiting = TRUE;

            (void) msleep(&mtmp->fRootVNode, mtmp->fRootMutex, PINOD, "EmptyFSMountGetRootVNodeCreatingIfNecessary", NULL);

            err = EAGAIN;

        } else if (mtmp->fRootVNode == NULL) {

            vnode_t                 newVN;

            struct vnode_fsparam    params;

            // There is no root vnode, so create it.  While we're creating it, we

            // drop our lock (to avoid the possibility of deadlock), so we set

            // fRootAttaching to stall anyone else entering the code (and eliminate

            // the possibility of two people trying to create the same vnode).

            mtmp->fRootAttaching = TRUE;

            lck_mtx_unlock(mtmp->fRootMutex);

            newVN = NULL;

            params.vnfs_mp         = mtmp->fMountPoint;

            params.vnfs_vtype      = VDIR;

            params.vnfs_str        = NULL;

            params.vnfs_dvp        = NULL;

            params.vnfs_fsnode     = NULL;

            params.vnfs_vops       = gVNodeOperations;

            params.vnfs_markroot   = TRUE;

            params.vnfs_marksystem = FALSE;

            params.vnfs_rdev       = 0;                                 // we don't currently support VBLK or VCHR

            params.vnfs_filesize   = 0;                                 // not relevant for a directory

            params.vnfs_cnp        = NULL;

            params.vnfs_flags      = VNFS_NOCACHE | VNFS_CANTCACHE;     // do no vnode name caching

            err = vnode_create(VNCREATE_FLAVOR, sizeof(params), &params, &newVN);

            assert( (err == 0) == (newVN != NULL) );

            lck_mtx_lock(mtmp->fRootMutex);

            if (err == 0) {

                // If we successfully create the vnode, it's time to install it as

                // the root.  No one else should have been able to get here, so

                // mtmp->fRootVNode should still be NULL.  If it's not, that's bad.

                assert(mtmp->fRootVNode == NULL);

                mtmp->fRootVNode = newVN;

                // Also let the VFS layer know that we have a soft reference to

                // the vnode.

                junk = vnode_addfsref(newVN);

                assert(junk == 0);

                // If anyone got hung up on mtmp->fRootAttaching, unblock them.

                assert(mtmp->fRootAttaching);

                mtmp->fRootAttaching = FALSE;

                if (mtmp->fRootWaiting) {

                    wakeup(&mtmp->fRootVNode);

                    mtmp->fRootWaiting = FALSE;

                // Set up the function result.  Note that vnode_create creates the

                // vnode with an I/O reference count, so we can just return it

                // directly.

                resultVN = mtmp->fRootVNode;

                err = 0;

        } else {

            vnode_t     candidateVN;

            // We already have a root vnode.  Drop our lock (again, to avoid deadlocks)

            // and get a reference on it, using the vnode ID (vid) to confirm that it's

            // still valid.  If that works, we're all set.  Otherwise, let's just start

            // again from scratch.

            candidateVN = mtmp->fRootVNode;

            vid = vnode_vid(candidateVN);

            lck_mtx_unlock(mtmp->fRootMutex);

            err = vnode_getwithvid(candidateVN, vid);

            if (err == 0) {

                // All ok.   vnode_getwithvid has taken an I/O reference count on the

                // vnode, so we can just return it to the caller.  This reference

                // prevents the vnode from being reclaimed in the interim.

                resultVN = candidateVN;

                assert(err == 0);

            } else {

                // vnode_getwithvid failed.  This is most likely because the vnode

                // has been reclaimed between dropping the lock and calling vnode_getwithvid.

                // That's cool.  We just loop again, and this time we'll get the updated

                // results (hopefully).

                err = EAGAIN;

            // We need to reacquire the lock because that's the loop invariant.

            // Strictly speaking we don't need to do this in the 'success' case,

            // but it makes the code simpler (and I don't care about the trivial

            // performance cost in this sample).

            lck_mtx_lock(mtmp->fRootMutex);

        // resultVN should only be set if everything is OK.

        assert( (err == 0) == (resultVN != NULL) );

    } while (err == EAGAIN);

    lck_mtx_unlock(mtmp->fRootMutex);

    if (err == 0) {

        *vnPtr = resultVN;

    // Post-conditions

    assert( (err == 0) == (*vnPtr != NULL) );

    return err;

static void EmptyFSMountDetachRootVNode(EmptyFSMount *mtmp, vnode_t vn)

    // Called by higher-level code within our VFS plug-in to reclaim a vnode,

    // that is, for us to 'forget' about it.  We only 'know' about one vnode,

    // the root vnode, so this code is much easier than it would be in a

    // real file system.

    int     junk;

    assert(mtmp != NULL);

    assert(vn != NULL);

    lck_mtx_lock(mtmp->fRootMutex);

    // We can ignore mtmp->fRootAttaching here because, if it's set, mtmp->fRootVNode

    // will be NULL.  And, if that's the case, we just do nothing and return.  That's

    // exactly the correct behaviour if the system tries to reclaim the vnode while

    // some other thread is in the process of attaching it.

//

    // The following assert checks the assumption that makes this all work.

    assert( ! mtmp->fRootAttaching || (mtmp->fRootVNode == NULL) );

    if (mtmp->fRootVNode == NULL) {

        // Do nothing; someone beat us to the reclaim; nothing to do.

    } else {

        // The vnode we're reclaiming should be the root vnode.  If it isn't,

        // I want to know about it.

        assert(mtmp->fRootVNode == vn);

        // Tell VFS that we're removing our soft reference to the vnode.

        junk = vnode_removefsref(mtmp->fRootVNode);

        assert(junk == 0);

        mtmp->fRootVNode = NULL;

    lck_mtx_unlock(mtmp->fRootMutex);

#if MACH_ASSERT

    static boolean_t ValidVNode(vnode_t vn)

        // Returns true if the vnode is valid on our file system.

        // In this case, the only valid vnode is the root vnode,

        // so the implementation is trivial.

        boolean_t       result;

        EmptyFSMount *  mtmp;

        assert(vn != NULL);

        mtmp = EmptyFSMountFromMount( vnode_mount(vn) );

        lck_mtx_lock(mtmp->fRootMutex);

        result = (vn == mtmp->fRootVNode);

        lck_mtx_unlock(mtmp->fRootMutex);

        return result;

#endif

/////////////////////////////////////////////////////////////////////

#pragma mark ***** VNode Operations

static errno_t VNOPLookup(struct vnop_lookup_args *ap)

    // This is called by VFS to do a directory lookup.

//

    // dvp is the directory to search.

//

    // cnp describes the name to search for.  This is kinda complicated, although

    // the comments in <sys/vnode.h> are pretty helpful.

//

    // vpp is a pointer to a vnode where we return the found item.  The

    // returned vnode must have an I/O reference, and the caller is responsible

    // for releasing it.

//

    // context identifies the calling process.

    errno_t                 err;

    vnode_t                 dvp;

    vnode_t *               vpp;

    struct componentname *  cnp;

    vfs_context_t           context;

    vnode_t                 vn;

    // Unpack arguments

    dvp     = ap->a_dvp;

    vpp     = ap->a_vpp;

    cnp     = ap->a_cnp;

    context = ap->a_context;

    // Pre-conditions

    assert(dvp != NULL);

    assert(vnode_isdir(dvp));

    assert( ValidVNode(dvp) );

    assert(vpp != NULL);

    assert(cnp != NULL);

    assert(context != NULL);

    // Prepare for failure.

    vn = NULL;

    // Trivial implementation

    if (cnp->cn_flags & ISDOTDOT) {

        // Implement lookup for ".." (that is, the parent directory).  As we currently

        // only support one directory (the root directory) and the parent of the root

        // is always the root, this is trivial (and, incidentally, exactly the same

        // as the code for ".", but that wouldn't be true in a more general VFS plug-in).

        // We just get an I/O reference on dvp and return that.

        err = vnode_get(dvp);

        if (err == 0) {

            vn = dvp;

    } else if ( (cnp->cn_namelen == 1) && (cnp->cn_nameptr[0] == '.') ) {

        // Implement lookup for "." (that is, this directory).  Just get an I/O reference

        // to dvp and return that.

        err = vnode_get(dvp);

        if (err == 0) {

            vn = dvp;

    } else {

        err = ENOENT;

    // Under all circumstances we set *vpp to vn.  That way, we satisfy the

    // post-condition, regardless of what VFS uses as the initial value for

    // *vpp.

    *vpp = vn;

    // Post-conditions

    assert( (err == 0) == (*vpp != NULL) );

    return err;

static errno_t VNOPOpen(struct vnop_open_args *ap)

    // Called by VFS to open a vnode for access.

//

    // vp is the vnode that's being opened.

//

    // mode contains the flags passed to open (things like FREAD).

//

    // context identifies the calling process.

//

    // This entry is rarely useful because VFS can read a file vnode without ever

    // opening it, thus any work that you'd usually do here you have to do lazily in

    // your read/write entry points.

//

    // Regardless, in our implementation we have nothing to do.

    vnode_t         vp;

    int             mode;

    vfs_context_t   context;

    // Unpack arguments

    vp      = ap->a_vp;

    mode    = ap->a_mode;

    context = ap->a_context;

    // Pre-conditions

    assert( ValidVNode(vp) );

    AssertKnownFlags(mode, O_EVTONLY | O_NONBLOCK | FREAD | FWRITE);

    assert(context != NULL);

    // Empty implementation

    assert(vnode_isdir(vp));

    return 0;

static errno_t VNOPClose(struct vnop_close_args *ap)

    // Called by VFS to close a vnode for access.

//

    // vp is the vnode that's being closed.

//

    // fflags contains the flags associated with the close (things like FREAD).

//

    // context identifies the calling process.

//

    // This entry is not as useful as you might think because a vnode can be accessed

    // after the last close (if, for example, if has been memory mapped).  In most cases

    // the work that you might think to do here, you end up doing in VNOPInactive.

//

    // Regardless, in our implementation we have nothing to do.

    vnode_t         vp;

    int             fflag;

    vfs_context_t   context;

    // Unpack arguments

    vp      = ap->a_vp;

    fflag   = ap->a_fflag;

    context = ap->a_context;

    // Pre-conditions

    assert( ValidVNode(vp) );

    AssertKnownFlags(fflag, O_EVTONLY | O_NONBLOCK | FREAD | FWRITE);

    assert(context != NULL);

    // Empty implementation

    assert(vnode_isdir(vp));

    return 0;

static errno_t VNOPGetattr(struct vnop_getattr_args *ap)

    // Called by VFS to get information about a vnode (this is called by the

    // VFS implementation of <x-man-page://2/stat> and <x-man-page://2/getattrlist>).

//

    // vp is the vnode whose information is requested.

//

    // vap describes the attributes requested and the place to store the results.

//

    // context identifies the calling process.

//

    // You have two options for doing this:

//

    // o For attributes whose values you have readily available, use the VATTR_RETURN

    //   macro to unilaterally return the value.

//

    // o For attributes whose values are hard to calculate, use VATTR_IS_ACTIVE to see

    //   if the caller requested the attribute and, if so, copy the value into the

    //   appropriate field.

//

    // Our implementation is trivial; we just return statically configured values.

    vnode_t             vp;

    struct vnode_attr * vap;

    vfs_context_t       context;

    EmptyFSMount *      mtmp;

    static const struct timespec kYearZero = {0, 0};

    // Unpack arguments

    vp      = ap->a_vp;

    vap     = ap->a_vap;

    context = ap->a_context;

    // Pre-conditions

    assert( ValidVNode(vp) );

    assert(vap != NULL);

    assert(context != NULL);

    // Trivial implementation

    assert(vnode_isdir(vp));

    mtmp = EmptyFSMountFromMount(vnode_mount(vp));

    // The implementation of <x-man-page://2/stat> requires that we support va_rdev,

    // even on vnodes that aren't device vnodes (as is the case for all our vnodes).

    VATTR_RETURN(vap, va_rdev,        0);

    VATTR_RETURN(vap, va_nlink,       2);           // traditional for directories

//  VATTR_RETURN(vap, va_total_size,  xxx);

//  VATTR_RETURN(vap, va_total_alloc, xxx);

    VATTR_RETURN(vap, va_data_size,   2 * sizeof(struct dirent));

//  VATTR_RETURN(vap, va_data_alloc,  xxx);

//  VATTR_RETURN(vap, va_iosize,      xxx);

//  VATTR_RETURN(vap, va_uid,   xxx);

//  VATTR_RETURN(vap, va_gid,   xxx);

    VATTR_RETURN(vap, va_mode,  S_IFDIR | S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);

//  VATTR_RETURN(vap, va_flags, xxx);

//  VATTR_RETURN(vap, va_acl,   xxx);

    // The only date we really keep track of is the creation date.  However,

    // the implementation of <x-man-page://2/stat> requires that we support

    // the other dates (that is, it does a VATTR_WANTED on these dates and

    // doesn't check that we returned them, or initialise them to a default

    // value).  I didn't want to lie to the system and just return dummy values,

    // and I also didn't want to get random numbers back for these dates.

    // Thus, I initialise the fields to default values but don't mark them

    // as supported.

    VATTR_RETURN(vap, va_create_time, mtmp->fAttr.f_create_time);

//  VATTR_RETURN(vap, va_access_time, xxx);

    vap->va_access_time = kYearZero;

//  VATTR_RETURN(vap, va_modify_time, xxx);

    vap->va_modify_time = kYearZero;

//  VATTR_RETURN(vap, va_change_time, xxx);

    vap->va_change_time = kYearZero;

//  VATTR_RETURN(vap, va_backup_time, xxx);

    VATTR_RETURN(vap, va_fileid,   2);

//  VATTR_RETURN(vap, va_linkid,   xxx);

//  VATTR_RETURN(vap, va_parentid, xxx);

    VATTR_RETURN(vap, va_fsid,     mtmp->fBlockRDevNum);

//  VATTR_RETURN(vap, va_filerev,  xxx);

//  VATTR_RETURN(vap, va_gen,      xxx);

//  VATTR_RETURN(vap, va_encoding, xxx);

//  VATTR_RETURN(vap, va_type,  xxx);                   // handled by VFS

//  VATTR_RETURN(vap, va_name,  xxx);                   // let VFS get this from f_mntonname

//  VATTR_RETURN(vap, va_uuuid, xxx);

//  VATTR_RETURN(vap, va_guuid, xxx);

//  VATTR_RETURN(vap, va_nchildren, xxx);

    return 0;

static errno_t uiomove_atomic(void *addr, size_t size, uio_t uio)

    errno_t     err;

    if (size > uio_resid(uio)) {

        err = ENOBUFS;

    } else {

        err = uiomove(addr, size, uio);

    return err;

static errno_t VNOPReadDir(struct vnop_readdir_args *ap)

    // Called by VFS to iterate the contents of a directory (most notably

    // by the implementation of <x-man-page://2/getdirentries>).

//

    // vp is the directory we're iterating.

//

    // uio describes the buffer into which we copy the (struct dirent) values

    // that represent directory entries; it is discussed in detail below.

//

    // flags contains two options bits, VNODE_READDIR_EXTENDED and

    // VNODE_READDIR_REQSEEKOFF, neither of which we support (they're only

    // needed if the file system is to be NFS exported).

//

    // eofflagPtr, if not NULL, is a place to indicate that we've read the

    // last directory entry.

//

    // numdirententPtr, if not NULL, is a place to return a count of the

    // number of directory entries that we've returned.

//

    // context identifies the calling process.

//

    // The hardest thing to understand about this entry point is the UIO

    // management.  There are two tricky aspects:

//

    // o The UIO offset (accessed via uio_offset and uio_setoffset)

    //   determines the first directory item read.  This does not have

    //   to literally be an offset into the directory (such a usage makes

    //   sense on a UFS-style file system, but it makes no sense for a

    //   file system, like HFS Plus, which has no obvious directory offset).

    //   Rather, the semantics are as follows:

//

    //   - A UIO offset of zero indicates that you should read from the

    //     start of the directory.

//

    //   - You are responsible for setting the UIO offset to indicate how

    //     much you read.

//

    //   - This offset value can then be passed back to you to continue

    //     reading at that offset.

//

    //   So, if you have a file system where you can index directory items,

    //   it's perfectly reasonable for you to use an index as the UIO offset.

    //   However, there are some gotchas:

//

    //   - The UIO offset is an off_t, so you might think that you have 64 bits

    //     to play with.  However, this is truncated down to a long in the

    //     basep parameter of getdirentries, so you only have 32 bits (because

    //     a long is 32 bits for 32-bit client processes).

//

    //   - Furthermore, you only /actually/ have 31 bits, because longs are

    //     signed, and if you return a negative offset then, if the client

    //     tries to lseek <x-man-page://2/lseek> to that offset (which is a

    //     legal usage pattern), lseek will fail (because it arbitrarily

    //     disallows negative offsets, even for directories).

//

    //   - Remember that uiomove increments the UIO offset by the number of bytes

    //     that it copies.  Typically this is not useful behaviour for directories.

    //     In most cases you will want to explicitly set the UIO offset

    //     (using uio_setoffset) before you return.

//

    //   - Because the offset can be set by untrusted programs (using lseek),

    //     you must be able to safely (that is, without kernel panicking!)

    //     reject illegal offsets.  If the client calls getdirentries after seeking

    //     to a bogus offset, you should return EINVAL.

//

    //   - Depending on your volume format, it may be expensive to verify that

    //     the offset is valid.  In that case, you may want to cache the last

    //     offset that you returned in your FSNode.  There are two things to be careful

    //     about here:

//

    //     - Make sure you invalidate the cache if you do something that changes whether

    //       an offset is valid.

//

    //     - Be aware that you may need more than one cache entry, because multiple

    //       client may be reading the directory simultaneously.  Remember, while

    //       each client gets their own file descriptor, there's only one FSNode

    //       for any given on-disk directory.

//

    // o The UIO resid (residual ID, accessed by uio_resid and uio_setresid)

    //   indicates how much space is left in the user buffer described by the UIO.

    //   You must update this as you copy data out into that buffer (fortunately,

    //   the obvious copying routine, uiomove does this update for you).  The VFS

    //   layer uses this value to calculate the return value for the

    //   getdirentries system call.  That is, the return value of

    //   getdirentries is the original buffer size minus this UIO resid.

    //   So, if you completely fill the user's buffer (hence resid is

    //   0), getdirentries will return the original buffer size.

    //   On the other hand, if you return no data, resid will be equal

    //   to the buffer size, and getdirentries will return 0 (an indication

    //   that there are no more items in the directory).

//

    //   It's also worth noting that there is no guarantee that the

    //   user's buffer size will be an even multiple of your dirent

    //   size (in fact, there's no requirement for you to have a

    //   fixed dirent size).  Thus, even after you've filled the user's

    //   buffer (you've copied out all of the entries that will fit),

    //   it's possible for resid to be positive.  Under no circumstances

    //   should you copy out a partial dirent.

//

    // o uiomove does not error if it only copies out a part of the data

    //   that you requested.  You should call uio_resid to ensure that

    //   there's enough space for the entire dirent before calling uiomove.

//

    // Make sure you read <x-man-page://5/dirent> for information about

    // (struct dirent).  Specifically, this page defines constraints on

    // (struct dirent) to which you must comply.

//

    // On success, *eofflagPtr is TRUE if we've returned the last

    // entry in this directory.  The NFS server uses this information

    // to tag the reply packet that contains this entry with an EOF

    // marker; this avoids the need for the client to make another

    // call to confirm that it has read the entire directory.

//

    // On success, *numdirentPtr is the number of dirent structures

    // that we read.

//

    // Our implementation is very easy, simply because we only have one directory

    // (the root) and it only has two entries ("." and "..").  Note that we /don't/

    // check for available space in the user's buffer; we just cook up the next

    // directory entry and allow our uio_move abstraction to error if there's not

    // enough space.  This is convenient for our code and, because of the trivial cost

    // to set up thisItem, not a performance problem.  If setting up thisItem was

    // expensive, or there was a fixed cost for accessing a directory that we could

    // amortise over multiple entries, it would be sensible to look at uio_resid to

    // see how many entries to generate up front.

    errno_t         err;

    vnode_t         vp;

    struct uio *    uio;

    int             flags;

    int *           eofflagPtr;

    int             eofflag;

    int *           numdirentPtr;

    int             numdirent;

    vfs_context_t   context;

    // Unpack arguments

    vp           = ap->a_vp;

    uio          = ap->a_uio;

    flags        = ap->a_flags;

    eofflagPtr   = ap->a_eofflag;

    numdirentPtr = ap->a_numdirent;

    context      = ap->a_context;

    // Pre-conditions

    assert( ValidVNode(vp) );

    assert(uio != NULL);

    AssertKnownFlags(flags, VNODE_READDIR_EXTENDED | VNODE_READDIR_REQSEEKOFF);

    // assert(eofflag != NULL);     // it's fine for this to be NULL

    // assert(numdirent == NULL);   // this is NULL in the typical case

    assert(context != NULL);

    // An easy, but non-trivial, implementation

    assert(vnode_isdir(vp));

    eofflag = FALSE;

    numdirent = 0;

    if ( (flags & VNODE_READDIR_EXTENDED) || (flags & VNODE_READDIR_REQSEEKOFF) ) {

        // We only need to support these flags if we want to support being exported

        // by NFS.

        err = EINVAL;

    } else {

        struct dirent   thisItem;

        off_t           index;

        err = 0;

        // Set up thisItem.

        thisItem.d_fileno = 2;

        thisItem.d_reclen = sizeof(thisItem);

        thisItem.d_type = DT_DIR;

        strcpy(thisItem.d_name, ".");

        thisItem.d_namlen = strlen(".");

        // We set uio_offset to the directory item index * 7 to:

//

        // o Illustrate the points about uio_offset usage in the comment above.

//

        // o Allow us to check that we're getting valid input.

//

        // However, be aware of the comments above about not trusting uio_offset;

        // the client can set it to an arbitrary value using lseek.

        assert( (uio_offset(uio) % 7) == 0);

        index = uio_offset(uio) / 7;

        // If we're being asked for the first directory entry...

        if (index == 0) {

            err = uiomove_atomic(&thisItem, sizeof(thisItem), uio);

            if (err == 0) {

                numdirent += 1;

                index += 1;

        // If we're being asked for the second directory entry...

        if ( (err == 0) && (index == 1) ) {

            strcpy(thisItem.d_name, "..");

            thisItem.d_namlen = strlen("..");

            err = uiomove_atomic(&thisItem, sizeof(thisItem), uio);

            if (err == 0) {

                numdirent += 1;

                index += 1;

        // If we failed because there wasn't enough space in the user's buffer,

        // just swallow the error.  This will result getdirentries returning

        // less than the buffer size (possibly even zero), and the caller is

        // expected to cope with that.

        if (err == ENOBUFS) {

            err = 0;

        // Update uio_offset.

        uio_setoffset(uio, index * 7);

        // Determine if we're at the end of the directory.

        eofflag = (index > 1);

    // Copy out any information that's requested by the caller.

    if (eofflagPtr != NULL) {

        *eofflagPtr = eofflag;

    if (numdirentPtr != NULL) {

        *numdirentPtr = numdirent;

    return err;

static errno_t VNOPReclaim(struct vnop_reclaim_args *ap)

    // Called by VFS to disassociate this vnode from the underlying FSNode.

//

    // vp in the vnode to reclaim.

//

    // context identifies the calling process.

//

    // This operation should be relatively cheap; it is /not/ the point where,

    // for example, you should write the FSNode back to disk (rather, you should

    // do that in your VNOPInactive entry point).

//

    // IMPORTANT:

    // If VNOPReclaim fails, the system panics.

//

    // In our implementation this is relatively easy because we only support one

    // vnode.  Still, there are some tricky race conditions to ponder.  In a proper

    // file system, this entry point would have to be coordinated with the FSNode

    // hash layer.

    vnode_t         vp;

    vfs_context_t   context;

    EmptyFSMount *  mtmp;

    // Unpack arguments

    vp           = ap->a_vp;

    context      = ap->a_context;

    // Pre-conditions

    assert(vp != NULL);

    assert( ValidVNode(vp) );

    assert(context != NULL);

    // Do this at as 'FSNode hash' layer.

    mtmp = EmptyFSMountFromMount(vnode_mount(vp));

    EmptyFSMountDetachRootVNode(mtmp, vp);

    return 0;

/////////////////////////////////////////////////////////////////////

#pragma mark ***** VFS Operations

static errno_t VFSOPUnmount(mount_t mp, int mntflags, vfs_context_t context);

    // forward declaration

static errno_t VFSOPMount(mount_t mp, vnode_t devvp, user_addr_t data, vfs_context_t context)

    // Called by VFS to mount an instance of our file system.

//

    // mp is a reference to the kernel structure tracking this instance of the

    // file system.

//

    // devvp is either:

    //   o an open vnode for the block device on which we're mounted, or

    //   o NULL

    // depending on the VFS_TBLLOCALVOL flag in the vfe_flags field of the vfs_fsentry

    // that we registered.  In the former case, the first field of our file system specific

    // mount arguments must be a pointer to a C string holding the UTF-8 path to the block

    // device node.

//

    // data is a pointer to our file system specific mount arguments in the address

    // space of the current process (the one that called mount).  This is a parameter

    // block passed to us by our mount tool telling us what to mount and how.  Because

    // VFS_TBLLOCALVOL is set, the first field of this structure must be pointer to the

    // path of the block device node; the kernel interprets this parameter, opening up

    // the node for us.

//

    // IMPORTANT:

    // If VFS_TBLLOCALVOL is set, the first field of the file system specific mount

    // parameters is interpreted by the kernel AND THE KERNEL INCREMENTS data TO POINT

    // TO THE FIELD AFTER THE PATH.  We handle this by defining our mount parameter

    // structure (EmptyFSMountArgs) in two ways: for user space code, the first field

    // (fDevNodePath) is a poiner to the block device node path; for kernel code, we omit

    // this field.

//

    // IMPORTANT:

    // If your file system claims to be 64-bit ready (VFS_TBL64BITREADY is set), you must

    // be prepared to handle mount requests from both 32- and 64-bit processes.  Thus,

    // your file system specific mount parameters must be either 32/64-bit invariant

    // (as is the case for this example), or you must intepret them differently depending

    // on the type of process you're being called by (see proc_is64bit from <sys/proc.h>).

//

    // context identifies the calling process.

    int                 err;

    int                 junk;

    EmptyFSMountArgs    args;

    EmptyFSMount *      mtmp;

    // Pre-conditions

    assert(mp != NULL);

    assert(devvp != NULL);

    assert(data != 0);

    assert(context != NULL);

    mtmp = NULL;

    // This example does not support updating a volume's state (for example,

    // upgrading it from read-only to read/write).

    err = 0;

    if ( vfs_isupdate(mp) ) {

        err = ENOTSUP;

    // Copy in the mount arguments and use them to initialise our mount

    // structure.

    if (err == 0) {

        err = copyin(data, &args, sizeof(EmptyFSMountArgs));

    if (err == 0) {

        if ( args.fMagic != kEmptyFSMountArgsMagic ) {

            err = EINVAL;

    if (err == 0) {

        mtmp = OSMalloc(sizeof(*mtmp), gOSMallocTag);

        if (mtmp == NULL) {

            err = ENOMEM;

        } else {

            memset(mtmp, 0, sizeof(*mtmp));

            mtmp->fMagic = kEmptyFSMountMagic;

            vfs_setfsprivate(mp, mtmp);

    // Fill out the fields in our mount point.

    if (err == 0) {

        // Start with stuff that can fail.

        // We don't really need to take a use count reference to the device vnode

        // because the system has done this for us.  However, it doesn't hurt and it

        // panders to my paranoia.

        err = vnode_ref(devvp);

        if (err == 0) {

            mtmp->fBlockDevVNode = devvp;

            mtmp->fBlockRDevNum  = vnode_specrdev(devvp);

        if (err == 0) {

            mtmp->fRootMutex = lck_mtx_alloc_init(gLockGroup, NULL);

            if (mtmp->fRootMutex == NULL) {

                err = ENOMEM;

        // Then do the stuff that can't fail.

        // IMPORTANT

        // EmptyFSInitAttr reads mtmp->fBlockRDevNum, so you must initialise it before

        // calling EmptyFSInitAttr.

        if (err == 0) {

            mtmp->fMountPoint = mp;

            mtmp->fDebugLevel = args.fDebugLevel;

            strncpy(mtmp->fVolumeName, "EmptyFS", sizeof(mtmp->fVolumeName));

            mtmp->fVolumeName[sizeof(mtmp->fVolumeName) - 1] = 0;

            EmptyFSInitAttr(mtmp);

            assert( ! mtmp->fRootAttaching);

            assert( ! mtmp->fRootWaiting);

            assert(mtmp->fRootVNode == NULL);

    // Set up the statfs information.  You can get a pointer to the vfsstatfs

    // that you need to fill out by calling vfs_statfs.  Before calling your

    // mount entry point, VFS has already zeroed the entire structure and set

    // up f_fstypename, f_mntonname, f_mntfromname (if VFC_VFSLOCALARGS was set;

    // in the other case VFS doesn't know this information and you have to set it

    // yourself), and f_owner.  You are responsible for filling out the other fields

    // (except f_reserved1, f_type, and f_flags, which are reserved).  You can also

    // override VFS's settings if need be.

//

    // The following code snippet just sets the values to sensible defaults.

    // IMPORTANT:

    // It is vital that you fill out all of these fields (especially the

    // f_bsize, f_bfree, and f_bavail fields) before returning from VFSOpMount.

    // If you don't, higher-level system components (such as File Manager) can

    // get very confused.  Specifically, File Manager can get and /cache/ these

    // values before calling VFSOPGetattr.  So you can't rely on a call to

    // VFSOPGetattr to set up these fields for the first time.

    if (err == 0) {

        struct vfsstatfs *  sbp;

        sbp = vfs_statfs(mp);

        assert(sbp != NULL);

        assert( strcmp(sbp->f_fstypename, "EmptyFS") == 0 );

        sbp->f_bsize  = mtmp->fAttr.f_bsize;

        sbp->f_iosize = mtmp->fAttr.f_iosize;

        sbp->f_blocks = mtmp->fAttr.f_blocks;

        sbp->f_bfree  = mtmp->fAttr.f_bfree;

        sbp->f_bavail = mtmp->fAttr.f_bavail;

        sbp->f_bused  = mtmp->fAttr.f_bused;

        sbp->f_files  = mtmp->fAttr.f_files;

        sbp->f_ffree  = mtmp->fAttr.f_ffree;

        sbp->f_fsid   = mtmp->fAttr.f_fsid;

    vfs_setflags(mp, 0

        | MNT_RDONLY

//      | MNT_SYNCHRONOUS

        | MNT_NOEXEC

        | MNT_NOSUID

        | MNT_NODEV

//      | MNT_UNION

//      | MNT_ASYNC

//      | MNT_DONTBROWSE

        | MNT_IGNORE_OWNERSHIP

//      | MNT_AUTOMOUNTED

//      | MNT_JOURNALED

//      | MNT_NOUSERXATTR

//      | MNT_DEFWRITE

//      | MNT_EXPORTED

//      | MNT_LOCAL

//      | MNT_QUOTA

//      | MNT_ROOTFS

//      | MNT_DOVOLFS

);

    // Don't think you need to call vnode_setmountedon because the system does it for you.

    if (err == 0) {

        if (args.fForceFailure) {

            // By setting the above to true, you can force a mount failure, which

            // allows you to test the unmount path.

            printf("EmptyFS:VFSOPMount: mount succeeded, force failure\n");

            err = ENOTSUP;

        } else {

            printf("EmptyFS:VFSOPMount: mount succeeded\n");

    } else {

        printf("EmptyFS:VFSOPMount: mount failed with error %d\n", err);

    // If we return an error, our unmount VFSOP is never called.  Thus, we have

    // to clean up ourselves.

    if (err != 0) {

        junk = VFSOPUnmount(mp, MNT_FORCE, context);

        assert(junk == 0);

    return err;

static errno_t VFSOPStart(mount_t mp, int flags, vfs_context_t context)

    // Called by VFS to confirm the mount.

//

    // mp is a reference to the kernel structure tracking this instance of the

    // file system.

//

    // flags is reserved.

//

    // context identifies the calling process.

//

    // This entry point isn't particularly useful; to avoid concurrency problems

    // you should do all of your initialisation before returning from VFSOPMount.

//

    // Moreover, it's not necessary to implement this because the kernel glue

    // (VFS_START) a ignores NULL entry and returns ENOTSUP, and the caller ignores

    // that error.

//

    // Still, I implement it just in case.

    // Pre-conditions

    assert(mp != NULL);

    AssertKnownFlags(flags, 0);

    assert(context != NULL);

    return 0;

static errno_t VFSOPUnmount(mount_t mp, int mntflags, vfs_context_t context)

    // Called by VFS to unmount a volume.  Also called by our VFSOPMount code

    // to clean up if something goes wrong.

//

    // mp is a reference to the kernel structure tracking this instance of the

    // file system.

//

    // mntflags is a set of flags; currently only MNT_FORCE is defined.

//

    // context identifies the calling process.

    int             err;

    boolean_t       forcedUnmount;

    EmptyFSMount *  mtmp;

    int             flushFlags;

    // Pre-conditions

    assert(mp != NULL);

    AssertKnownFlags(mntflags, MNT_FORCE);

    assert(context != NULL);

    // Implementation

    forcedUnmount = (mntflags & MNT_FORCE) != 0;

    if (forcedUnmount) {

        flushFlags = FORCECLOSE;

    } else {

        flushFlags = 0;

    // Prior to calling us, VFS has flushed all regular vnodes (that is, it called

    // vflush with SKIPSWAP, SKIPSYSTEM, and SKIPROOT set).  Now we have to flush

    // all vnodes, including the root.  If flushFlags is FORCECLOSE, this is a

    // forced unmount (which will succeed even if there are files open on the volume).

    // In this case, if a vnode can't be flushed, vflush will disconnect it from the

    // mount.

    err = vflush(mp, NULL, flushFlags);

    // Clean up the file system specific data attached to the mount.

    if (err == 0) {

        // If VFSOPMount fails, it's possible for us to end up here without a

        // valid file system specific mount record.  We skip the clean up if

        // that happens.

        if ( vfs_fsprivate(mp) != NULL ) {

            mtmp = EmptyFSMountFromMount(mp);

            if (mtmp->fBlockDevVNode != NULL) {         // release our reference, if any

                vnode_rele(mtmp->fBlockDevVNode);

                mtmp->fBlockDevVNode = NULL;

                mtmp->fBlockRDevNum = 0;

            // Prior to calling us, VFS ensures that no one is running within

            // our file system.  Thus, neither of these flags should be set.

            assert( ! mtmp->fRootAttaching);

            assert( ! mtmp->fRootWaiting);

            // The vflush, above, forces VFS to reclaim any vnodes on our volume.

            // Thus, fRootVNode should be NULL.

            assert(mtmp->fRootVNode == NULL);

            if (mtmp->fRootMutex != NULL) {

                lck_mtx_free(mtmp->fRootMutex, gLockGroup);

            mtmp->fMagic = kEmptyFSMountBadMagic;

            OSFree(mtmp, sizeof(*mtmp), gOSMallocTag);

    return err;

static errno_t VFSOPRoot(mount_t mp, struct vnode **vpp, vfs_context_t context)

    // Called by VFS to get the root vnode of this instance of the file system.

//

    // mp is a reference to the kernel structure tracking this instance of the

    // file system.

//

    // vpp is a pointer to a vnode reference.  On success, we must set this to

    // the root vnode.  We must have an I/O reference on that vnode, and it's

    // the caller's responsibility to release it.

//

    // context identifies the calling process.

//

    // Our implementation is fairly simple,

    errno_t         err;

    vnode_t         vn;

    EmptyFSMount *  mtmp;

    // Pre-conditions

    assert(mp != NULL);

    assert(vpp != NULL);

    assert(context != NULL);

    // Trivial implementation

    mtmp = EmptyFSMountFromMount(mp);

    vn = NULL;

    err = EmptyFSMountGetRootVNodeCreatingIfNecessary(mtmp, &vn);

    // Under all circumstances we set *vpp to vn.  That way, we satisfy the

    // post-condition, regardless of what VFS uses as the initial value for

    // *vpp.

    *vpp = vn;

    // Post-conditions

    assert( (err != 0) || (*vpp != NULL) );

    return err;

static errno_t VFSOPGetattr(mount_t mp, struct vfs_attr *attr, vfs_context_t context)

    // Called by VFS to get information about this instance of the file system.

//

    // mp is a reference to the kernel structure tracking this instance of the

    // file system.

//

    // vap describes the attributes requested and the place to store the results.

//

    // context identifies the calling process.

//

    // Like VNOPGetattr, you have two macros that let you a) return values easily

    // (VFSATTR_RETURN), and b) see if you need to return a value (VFSATTR_IS_ACTIVE).

//

    // Our implementation is trivial because we pre-calculated all of the file

    // system attributes in a convenient form.

    EmptyFSMount *  mtmp;

    // Pre-conditions

    assert(mp != NULL);

    assert(attr != NULL);

    assert(context != NULL);

    // Trivial implementation

    mtmp = EmptyFSMountFromMount(mp);

    VFSATTR_RETURN(attr, f_objcount,     mtmp->fAttr.f_objcount);

    VFSATTR_RETURN(attr, f_filecount,    mtmp->fAttr.f_filecount);

    VFSATTR_RETURN(attr, f_dircount,     mtmp->fAttr.f_dircount);

    VFSATTR_RETURN(attr, f_maxobjcount,  mtmp->fAttr.f_maxobjcount);

    VFSATTR_RETURN(attr, f_bsize,        mtmp->fAttr.f_bsize);

    VFSATTR_RETURN(attr, f_iosize,       mtmp->fAttr.f_iosize);

    VFSATTR_RETURN(attr, f_blocks,       mtmp->fAttr.f_blocks);

    VFSATTR_RETURN(attr, f_bfree,        mtmp->fAttr.f_bfree);

    VFSATTR_RETURN(attr, f_bavail,       mtmp->fAttr.f_bavail);

    VFSATTR_RETURN(attr, f_bused,        mtmp->fAttr.f_bused);

    VFSATTR_RETURN(attr, f_files,        mtmp->fAttr.f_files);

    VFSATTR_RETURN(attr, f_ffree,        mtmp->fAttr.f_ffree);

    VFSATTR_RETURN(attr, f_fsid,         mtmp->fAttr.f_fsid);

    VFSATTR_RETURN(attr, f_capabilities, mtmp->fAttr.f_capabilities);

    VFSATTR_RETURN(attr, f_attributes,   mtmp->fAttr.f_attributes);

    VFSATTR_RETURN(attr, f_create_time,  mtmp->fAttr.f_create_time);

    VFSATTR_RETURN(attr, f_fssubtype,    mtmp->fAttr.f_fssubtype);

    if (VFSATTR_IS_ACTIVE(attr, f_vol_name) ) {

        strncpy(attr->f_vol_name, mtmp->fAttr.f_vol_name, MAXPATHLEN);

        attr->f_vol_name[MAXPATHLEN - 1] = 0;

        VFSATTR_SET_SUPPORTED(attr, f_vol_name);

    return 0;

/////////////////////////////////////////////////////////////////////

#pragma mark ***** Configuration Data

typedef errno_t (*VNodeOp)(void *);

// gVNodeOperationEntries is an array that describes all of the vnode operations

// supported by vnodes created by our VFS plug-in.  This is, in turn, wrapped up

// by gVNodeOperationVectorDesc and gVNodeOperationVectorDescList, and it's this

// last variable that's referenced by gVFSEntry.

// The following is a list of all of the vnode operations supported on

// Mac OS X 10.4, with the ones that we support uncommented.

static struct vnodeopv_entry_desc gVNodeOperationEntries[] = {

//  { &vnop_access_desc,        (VNodeOp) VNOPAccess      },

//  { &vnop_advlock_desc,       (VNodeOp) VNOPAdvlock     },

//  { &vnop_allocate_desc,      (VNodeOp) VNOPAllocate    },

//  { &vnop_blktooff_desc,      (VNodeOp) VNOPBlktooff    },

//  { &vnop_blockmap_desc,      (VNodeOp) VNOPBlockmap    },

//  { &vnop_bwrite_desc,        (VNodeOp) VNOPBwrite      },

    { &vnop_close_desc,         (VNodeOp) VNOPClose       },

//  { &vnop_copyfile_desc,      (VNodeOp) VNOPCopyfile    },

//  { &vnop_create_desc,        (VNodeOp) VNOPCreate      },

    { &vnop_default_desc,       (VNodeOp) vn_default_error},

//  { &vnop_exchange_desc,      (VNodeOp) VNOPExchange    },

//  { &vnop_fsync_desc,         (VNodeOp) VNOPFsync       },

    { &vnop_getattr_desc,       (VNodeOp) VNOPGetattr     },

//  { &vnop_getattrlist_desc,   (VNodeOp) VNOPGetattrlist },            // not useful, implement getattr instead

//  { &vnop_getxattr_desc,      (VNodeOp) VNOPGetxattr    },

//  { &vnop_inactive_desc,      (VNodeOp) VNOPInactive    },

//  { &vnop_ioctl_desc,         (VNodeOp) VNOPIoctl       },

//  { &vnop_link_desc,          (VNodeOp) VNOPLink        },

//  { &vnop_listxattr_desc,     (VNodeOp) VNOPListxattr   },

    { &vnop_lookup_desc,        (VNodeOp) VNOPLookup      },

//  { &vnop_mkdir_desc,         (VNodeOp) VNOPMkdir       },

//  { &vnop_mknod_desc,         (VNodeOp) VNOPMknod       },

//  { &vnop_mmap_desc,          (VNodeOp) VNOPMmap        },

//  { &vnop_mnomap_desc,        (VNodeOp) VNOPMnomap      },

//  { &vnop_offtoblk_desc,      (VNodeOp) VNOPOfftoblk    },

    { &vnop_open_desc,          (VNodeOp) VNOPOpen        },

//  { &vnop_pagein_desc,        (VNodeOp) VNOPPagein      },

//  { &vnop_pageout_desc,       (VNodeOp) VNOPPageout     },

//  { &vnop_pathconf_desc,      (VNodeOp) VNOPPathconf    },

//  { &vnop_read_desc,          (VNodeOp) VNOPRead        },

    { &vnop_readdir_desc,       (VNodeOp) VNOPReadDir     },

//  { &vnop_readdirattr_desc,   (VNodeOp) VNOPReaddirattr },

//  { &vnop_readlink_desc,      (VNodeOp) VNOPReadlink    },

    { &vnop_reclaim_desc,       (VNodeOp) VNOPReclaim     },

//  { &vnop_remove_desc,        (VNodeOp) VNOPRemove      },

//  { &vnop_removexattr_desc,   (VNodeOp) VNOPRemovexattr },

//  { &vnop_rename_desc,        (VNodeOp) VNOPRename      },

//  { &vnop_revoke_desc,        (VNodeOp) VNOPRevoke      },

//  { &vnop_rmdir_desc,         (VNodeOp) VNOPRmdir       },

//  { &vnop_searchfs_desc,      (VNodeOp) VNOPSearchfs    },

//  { &vnop_select_desc,        (VNodeOp) VNOPSelect      },

//  { &vnop_setattr_desc,       (VNodeOp) VNOPSetattr     },

//  { &vnop_setattrlist_desc,   (VNodeOp) VNOPSetattrlist },            // not useful, implement setattr instead

//  { &vnop_setxattr_desc,      (VNodeOp) VNOPSetxattr    },

//  { &vnop_strategy_desc,      (VNodeOp) VNOPStrategy    },

//  { &vnop_symlink_desc,       (VNodeOp) VNOPSymlink     },

//  { &vnop_whiteout_desc,      (VNodeOp) VNOPWhiteout    },

//  { &vnop_write_desc,         (VNodeOp) VNOPWrite       },

    { NULL, NULL }

};

// gVNodeOperationVectorDesc points to our vnode operations array

// (gVNodeOperationEntries) and to a place (gVNodeOperations) where the

// system, on successful registration, stores a final vnode array that's

// used to create our vnodes.

static struct vnodeopv_desc gVNodeOperationVectorDesc = {

    &gVNodeOperations,                          // opv_desc_vector_p

    gVNodeOperationEntries                      // opv_desc_ops

};

// gVNodeOperationVectorDescList is an array of vnodeopv_desc that allows us to

// register multiple vnode operations arrays at the same time.  A full-featured

// file system would use this to register different arrays for standard vnodes,

// device vnodes (VBLK and VCHR), and FIFO vnodes (VFIFO).  In our case, we only

// support standard vnodes, so our array only has one entry.

static struct vnodeopv_desc *gVNodeOperationVectorDescList[1] =

    &gVNodeOperationVectorDesc

};

// gVFSOps is a structure that contains pointer to all of the VFSOP routines.

// These are routines that operate on instances of the file system (rather than

// on vnodes).

static struct vfsops gVFSOps = {

    VFSOPMount,                                 // vfs_mount

    VFSOPStart,                                 // vfs_start

    VFSOPUnmount,                               // vfs_unmount

    VFSOPRoot,                                  // vfs_root

    NULL,                                       // vfs_quotactl

    VFSOPGetattr,                               // vfs_getattr

    NULL,                                       // vfs_sync

    NULL,                                       // vfs_vget

    NULL,                                       // vfs_fhtovp

    NULL,                                       // vfs_vptofh

    NULL,                                       // vfs_init

    NULL,                                       // vfs_sysctl

    NULL,                                       // vfs_setattr

    {NULL, NULL, NULL, NULL, NULL, NULL, NULL}  // vfs_reserved

};

// gVFSEntry describes the overall VFS plug-in.  It's passed as a parameter

// to vfs_fsadd to register this file system.

static struct vfs_fsentry gVFSEntry = {

    &gVFSOps,                                               // vfe_vfsops

    sizeof(gVNodeOperationVectorDescList) / sizeof(*gVNodeOperationVectorDescList),

                                                            // vfe_vopcnt

    gVNodeOperationVectorDescList,                          // vfe_opvdescs

    0,                                                      // vfe_fstypenum, see VFS_TBLNOTYPENUM below

    "EmptyFS",                                              // vfe_fsname

                                                            // vfe_flags

          VFS_TBLTHREADSAFE             // we do our own internal locking and thus don't need funnel protection

        | VFS_TBLFSNODELOCK             // ditto

        | VFS_TBLNOTYPENUM              // we don't have a pre-defined file system type (the VT_XXX constants

                                        // in <sys/vnode.h>); VFS should dynamically assign us a type

        | VFS_TBLLOCALVOL               // our file system is local; causes MNT_LOCAL to be set and indicates

                                        // that the first field of our file system specific mount arguments

                                        // is a path to a block device

        | VFS_TBL64BITREADY,            // we are 64-bit aware; our mount, ioctl and sysctl entry points

                                        // can be called by both 32-bit and 64-bit processes; we're will use

                                        // the type of process to interpret our arguments (if they're not

                                        // 32/64-bit invariant)

    {NULL, NULL}                                            // vfe_reserv

};

static vfstable_t gVFSTableRef = NULL;

/////////////////////////////////////////////////////////////////////

#pragma mark ***** KEXT Load/Unload

// Prototypes for our main entry points to satisfy the strict error check we

// have enabled.  We also force the symbols to be exported.

extern kern_return_t MODULE_START(kmod_info_t * ki, void * d);

extern kern_return_t MODULE_STOP (kmod_info_t * ki, void * d);

extern kern_return_t MODULE_START(kmod_info_t * ki, void * d)

    // Called by the kernel to initialise the KEXT.  The main feature of

    // this routine is a call to vfs_fsadd to register our VFS plug-in.

    #pragma unused(ki)

    #pragma unused(d)

    errno_t             err;

    kern_return_t       kernErr;

    assert(gVFSTableRef == NULL);           // just in case we get loaded twice (which shouldn't ever happen)

    kernErr = InitMemoryAndLocks();

    err = ErrnoFromKernReturn(kernErr);

    if (err == 0) {

        err = vfs_fsadd(&gVFSEntry, &gVFSTableRef);

    if (err != 0) {

        TermMemoryAndLocks();

    return KernReturnFromErrno(err);

extern kern_return_t MODULE_STOP(kmod_info_t * ki, void * d)

    // Called by the kernel to terminate the KEXT.  The main feature of

    // this routine is a call to vfs_fsremove to deregister our VFS plug-in.

    // If this fails (which it will if any of our volumes mounted), the KEXT

    // can't be unloaded.

    #pragma unused(ki)

    #pragma unused(d)

    errno_t             err;

    err = vfs_fsremove(gVFSTableRef);

    if (err == 0) {

        gVFSTableRef = NULL;

        TermMemoryAndLocks();

    return KernReturnFromErrno(err);

Next Previous