Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Building a Directory Hierarchy Snapshot

File system events tell you that something in a given directory changed. In some cases, this is sufficient—for example, if your application is a print or mail spooler, all it needs to know is that a file has been added to the directory.

In some cases, however, this is not enough, and you need to know precisely what changed within the directory. The simplest way to solve this problem is to take a snapshot directory hierarchy, storing your own copy of the state of the system at a given point in time. You might, for example, store a list of filenames and last modified dates, thus allowing you to determine which files have been modified since the last time you performed a backup.

You do this by iterating through the hierarchy and building up a data structure of your choice. As you cache this metadata, if you see changes during the caching process, you can reread the directory or directories that changed to obtain an updated snapshot. Once you have a cached tree of metadata that accurately reflects the current state of the hierarchy you are concerned with, you can then determine what file or files changed within a directory or hierarchy (after a file system event notification) by comparing the current directory state with your snapshot.

Important: To avoid missing changes, you must start monitoring the directory before you start scanning it. Because of the inherently non-deterministic latency in any notification mechanism on a multitasking operating system, it may not always be obvious whether the action that triggered an event occurred before or after a nested subdirectory was scanned. To guarantee that no changes are lost, it is best to always rescan any subdirectory that is modified during scanning rather than taking a time stamp for each subdirectory and trying to compare those time stamps with event time stamps.

Mac OS X provides a number of APIs that can make this easier. The scandir(3) function returns an array of directory entries that you can quickly iterate through. This is somewhat easier than reading a directory manually with opendir(3), readdir(3), and so on, and is slightly more efficient since you will always iterate through the entire directory while caching anyway.

The binary tree functions tsearch(3),, tfind(3), twalk(3), and tdelete(3) can simplify working with large search trees. In particular, binary trees are an easy way of quickly finding the cached file information from a particular directory. The following code snippet demonstrates the proper way to call these functions:

Listing 2-1  Using the tsearch, tfind, twalk, and tdelete API.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <dirent.h>
#include <sys/stat.h>
#include <string.h>
#include <search.h>
 
int array[] = { 1, 17, 2432, 645, 2456, 1234, 6543, 214, 3, 45, 34 };
void *dirtree;
 
static int cmp(const void *a, const void *b) {
    if (*(int *)a < *(int *)b) return -1;
    if (*(int *)a > *(int *)b) return 1;
    return 0;
}
 
void printtree(void);
 
/* Pass in a directory as an argument. */
int main(int argc, char *argv[])
{
    int i;
    for (i=0; i< sizeof(array) / sizeof(array[0]); i++) {
        void *x = tsearch(&array[i], &dirtree, &cmp);
        printf("Inserted %p\n", x);
    }
 
    printtree();
 
    void *deleted_node = tdelete(&array[2], &dirtree, &cmp);
    printf("Deleted node %p with value %d (parent node contains %d)\n",
        deleted_node, array[2], **(int**)deleted_node);
 
    for (i=0; i< sizeof(array) / sizeof(array[0]); i++) {
        void *node = tfind(&array[i], &dirtree, &cmp);
        if (node) {
            int **x = node;
            printf("Found %d (%d) at %p\n", array[i], **x, node);
        } else {
            printf("Not found: %d\n", array[i]);
        }
    }
    exit(0);
}
 
static void printme(const void *node, VISIT v, int k)
{
    const void *myvoid = *(void **)node;
    const int *myint = (const int *)myvoid;
    // printf("x\n");
    if (v != postorder && v != leaf) return;
    printf("%d\n", *myint);
}
 
void printtree(void)
{
    twalk(dirtree, &printme);
}

Two unusual design decisions in this API can make it tricky to use correctly if you haven’t used it before on other UNIX-based or UNIX-like operating systems:

The POSIX functions stat(2) and lstat(2) provide easy access to file metadata. These two functions differ in their treatment of symbolic links. The lstat function provides information about the link itself, while the stat function provides information about the file that the link points to. Generally speaking, when working with file system event notifications, you will probably want to use lstat, because changes to the underlying file will not result in a change notification for the directory containing the symbolic link to that file. However, if you are working with a controlled file structure in which symbolic links always point within your watched tree, you might have reason to use stat.

For an example of a tool that builds a directory snapshot, see the Watcher sample code.



< Previous PageNext Page > Hide TOC


Last updated: 2008-03-11




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice