Documentation Archive Developer
Search

ADC Home > Reference Library > Technical Notes > Internet & Web > Web Services >

Sherlock's Find By Content Library

CONTENTS

This Technote describes the Find by Content libraries used by Sherlock for searching the contents of files.

The Find by Content libraries export a full suite of routines and functions allowing applications to perform content based searches of files.

With MacOS 8.6, Text Extractor Plug-ins were introduced. These allow Find By Content to extract textual information from binary files for inclusion in index files. Text Extractor Plug-ins are documented in Technote TN1181, "Find by Content Text Extractor Plug-ins."

This Note is directed at application developers who wish to access the Find By Content library directly from their applications.

 Updated: [Oct 5 1999]






Overview

The Find By Content (FBC) facilities provided in Mac OS 8.6 are implemented in a PowerPC Code Fragment Manager library that resides in the "Extensions" folder. The Sherlock application is a client of FBC, accessing FBC services through this shared library. Developer applications can also access the search facilities provided by this library. This section describes how developers can create products that access the new FBC facilities through this shared library.

Compiler interfaces to FBC are found in the C header file "FindByContent.h." And, for linking purposes, the Code Fragment Manager library implementing FBC is named "Find By Content" (without the quotes). Developers using the FBC routines described herein should weak-link against this library, and then check the Gestalt selectors from within their application before calling any of these routines.

Back to top


Determining if Find By Content is Available

FBC defines two Gestalt selectors. Clients of FBC must verify that correct version of the implementation is available before making any of these calls, and will want to check the FBC indexing state before performing any searches.



enum {
    gestaltFBCVersion            = 'fbcv',
    gestaltFBCCurrentVersion     = 0x0011
};



The gestaltFBCVersion selector returns the version of FBC that is installed on the computer. Developers can compare this version with the version of the interface with which they have compiled their programs using the gestaltFBCCurrentVersion to determine if it is safe to make any calls to FBC. If gestaltFBCVersion produces some version other than the version of the interface your application has been compiled to run with, then your application should not make any calls to FBC.



enum {
    gestaltFBCIndexingState      = 'fbci',
    gestaltFBCindexingSafe       = 0,
    gestaltFBCindexingCritical   = 1
};



The gestaltFBCIndexingState selector returns information about the current indexing status of FBC. At any given time, the indexing status will be either gestaltFBCindexingSafe or gestaltFBCindexingCritical. If the status is gestaltFBCindexingCritical, then any search will result in a synchronous wait until the state returns to gestaltFBCindexingSafe. When the FBC indexing state returned is gestaltFBCindexingSafe, then all searches will execute immediately. To avoid synchronous waits, developers should check the gestaltFBCIndexingState selector and only make calls to FBC when the indexing state returned is gestaltFBCindexingSafe.

Back to top


Working with Search Sessions

FBC allows client applications to open and close a "search session." A search session contains all of the information about a search, including the list of matched files after the search is complete. Clients of FBC can obtain references to search sessions, modify them, and query their state using the routines defined in this section. References to search sessions are defined as an opaque pointer type owned by the FBC library.



typedef struct OpaqueFBCSearchSession* FBCSearchSession;


Developers should only access the search session structure using the routines defined herein. This includes using the appropriate FBC routines for duplicating and disposing of search sessions. Search sessions are complex memory structures that contain pointers to other data that may need to be copied when a search session is duplicated or disposed of when a search session is deallocated.

The normal sequence of actions one takes when using the FBC library is to create a search session, configure the search session to target specific volumes, perform the search, query the search results, and dispose of the search. Other possibilities for searches include the ability to reinitialize a search session and use it over again for another search, to provide backtracking by cloning search sessions and performing additional searches using the clones, or to limit search results to files found in particular directories.

Back to top


Setting up a Search Session

Creating a new session and preparing it for a search, as shown in Listing 6, requires at least two calls to the FBC library. In this example, a new search session is created and it is configured to search all local volumes that contain index files. The call to FBCAddAllVolumesToSession automatically configures the search session to search all indexed volumes.




/* SimpleSetUpSession allocates a new search session and
    returns a FBCSearchSession value in the *session
    parameter. if an error occurs, *session is left
    untouched. */

OSErr SimpleSetUpSession(FBCSearchSession* session) {
    OSErr err;
    FBCSearchSession newsession;

        /* set up our local variables */
    err = noErr;
    newsession = NULL;
    if (session == NULL) return paramErr;

        /* create the new session */
    err = FBCCreateSearchSession(&newsession);
    if (err != noErr) goto bail;

        /* search all available local volumes */
    err = FBCAddAllVolumesToSession(newsession, false);
    if (err != noErr) goto bail;

        /* store our result and leave */
    *session = newsession;
    return noErr;

bail:
    if (newsession != NULL)
        FBCDestroySearchSession(newsession);
    return err;
}

Listing 6. Setting up a search session to search all local, indexed volumes.

FBC provides a complete set of routines for developers wanting more control over what volumes will be searched by the search session. Listing 7 illustrates how a new search session could be configured to search a particular set of volumes.




/* SetUpVolumeSession allocates a new search session and
    returns a FBCSearchSession value in the *session parameter.
    if vCount is not zero, then vRefNums points to an array of
    volume reference numbers for volumes that are to be searched.
    if any of the vRefNums refer to a volume without an index,
    paramErr is returned.  */

OSErr SetUpVolumeSession (FBCSearchSession* session,
                            UInt16 vCount, SInt16 *vRefNums) {
    OSErr err;
    UInt16 i;
    FBCSearchSession newsession;

        /* set up our local variables */
    err = noErr;
    newsession = NULL;
    if (vCount == 0) return paramErr;
    if (session == NULL) return paramErr;
    if (vRefNums == NULL) return paramErr;

        /* create the new session */
    err = FBCCreateSearchSession(&newsession);
    if (err != noErr) goto bail;

        /* search the volumes specified in vRefNums */

    for (i=0; i<vCount; i++) {
        if (!FBCVolumeIsIndexed(vRefNums[i])) {
            err = paramErr;
            goto bail;
        } else {
            err = FBCAddVolumeToSession(newsession,
                                        vRefNums[i]);
            if (err != noErr) goto bail;
        }
    }

        /* store our result and leave */
    *session = newsession;
    return noErr;

bail:
    if (newsession != NULL)
        FBCDestroySearchSession(newsession);
    return err;
}

Listing 7. Setting up a session to search a particular set of volumes.

In this example, the FBCAddVolumeToSession routine is used to add volumes to the search session. Other routines for querying what volumes are currently targeted by a search session and removing volumes from that list are provided.

Once a search session has been configured to search a number of volumes, it can be used again after a search has been conducted without having to reconfigure its target volumes. After performing a search and examining the results, the search session can be prepared for another search by calling the routine FBCReleaseSessionHits. This routine releases all of the search results from the last search while leaving the list of target volumes intact.

Making a copy of a search session using the routine FBCCloneSearchSession will copy the list of target volumes to the duplicate search session.

Back to top


Performing Searches

When FBC performs a search, it will generate a list of files that were matched. This list is referred to as the "hits," and it is stored inside of the search session. FBC can be asked to perform a content-based search using a query string containing a list of words, a similarity search based on one or more hits obtained in a previous search, or a similarity search based on a list of example files. Listing 8 illustrates how a query-based search can be performed. Here, the query is used to search for matching files on all local indexed volumes.




OSErr SimpleFindByQuery (char *query, FBCSearchSession *session) {
    OSErr err;
    FBCSearchSession newsession;

        /* set up locals, check parameters... */
    if (query[0] == 0) return paramErr;
    if (session == NULL) return paramErr;
    newsession = NULL;

        /* allocate a new search session */
    err = SimpleSetUpSession(&newsession);
    if (err != noErr) goto bail;

        /* Here is the call that does the actual search,
        storing the results in the search session. */
    err = FBCDoQuerySearch(newsession, query,
                                   NULL, 0, 100, 100);
    if (err != noErr) goto bail;

        /* save the results and return */
    *session = newsession;
    return noErr;

bail:
    if (newsession != NULL)
        FBCDestroySearchSession(newsession);
    return err;
}

Listing 8. A Query based search of all local, indexed volumes.

Searches conducted using either the routine FBCDoExampleSearch or the routine FBCBlindExampleSearch can be used to locate files that are similar to other files. Similarity searches will locate files with similar content to the files specified as examples. Examples can be specified as indexes referring to hits obtained from previous searches, or as a list of FSSpec records referring to files on disk.

All three of the search routines - FBCDoExampleSearch, FBCBlindExampleSearch, and FBCDoQuerySearch - provide support for limiting the search results to files residing in one or more directories. To do this, clients provide a list of FSSpec records referring to target directories. The example in Listing 9 illustrates how to limit the results of a search to a particular set of directories.




enum {
    kMaxVols = 20,
    maxHits = 10,
    maxHitTerms = 10
};

OSErr RestrictedFindByQuery (char *query, UInt16 dirCount,
                                  FSSpec* dirList,
                                      FBCSearchSession* session) {
    UInt16 vCount, i;
    SInt16 vRefNums[kMaxVols], normalVol;
    FBCSearchSession newsession;

    vCount = 0;
    newsession = NULL;
    if (dirList == NULL || dirCount == 0) return paramErr;
    if (query == NULL) return paramErr;
    if (*query == 0) return paramErr;
    if (session == NULL) return paramErr;

        /* collect all of the unique volume reference numbers
        from the list of FSSpecs provided in the parameters. */
    for (i=0; i<dirCount; i++) {
        Boolean found;
        HParamBlockRec pb;

            /* ensure the vRefNum is a volume
            reference number */
        pb.volumeParam.ioVRefNum = dirList[i].vRefNum;
        pb.volumeParam.ioNamePtr = NULL;
        pb.volumeParam.ioVolIndex = 0;
        if ((err = PBHGetVInfoSync(&pb)) != noErr) goto bail;
        normalVol = pb.volumeParam.ioVRefNum;

            /* make sure it's not already in the list */
        for (found = false, j=0; j<vCount; j++)
            if (vRefNums[j] == normalVol) {
                found = true;
                break;
            }

            /* add the volume to the list */
        if (!found && vCount < kMaxVols)
            vRefNums[vCount++] = normalVol;
    }

        /* set up a session to use the volumes we found */
    err = SetUpVolumeSession(&newsession, vCount, vRefNums);
    if (err != noErr) goto bail;

        /* Here is the call that does the actual search,
        storing the results in the search session. */
    err = FBCDoQuerySearch(newsession, (char*)queryTxt,
                    dirList, dirCount, maxHits, maxHitTerms);
    if (err != noErr) goto bail;

        /* save the result and return */
    *session = newsession;
    return noErr;

bail:
    if (newsession != NULL)
        FBCDestroySearchSession(newsession);
    return err;
}

Listing 9. Searching a particular set of directories.

Here, volume reference numbers extracted from the array of FSSpec records referring to target directories provided as a parameter are used to configure the volumes that will be searched by the search session. Then, the list of target directories is passed to the FBCDoQuerySearch.

Retrieving Information from a Search Session

After a search is conducted using a search session, the search session may contain information about one or more matching files. Clients can access information about individual hits including the file's FSSpec record, the words that were matched in the file, the "score" assigned to the file during the last search operation, and additional information about the file. Listing 10 illustrates how one could obtain information about each hit returned by a search.



typedef OSErr (*HitProc) (FSSpec theDoc,
                             float score,
                             UInt32 nTerms,
                             FBCWordList hitTerms);

/* SampleHandleHits can be called after a search to enumerate
    the search results.  For each search hit, the hitFileProc
    function parameter is called with information describing
    the target.  */
OSErr SampleHandleHits (FBCSearchSession session,
                                   HitProc hitFileProc) {
    OSErr err;
    UInt32 hitCount, i;
    FSSpec targetDoc;
    float targetScore;
    FBCWordList targetTerms;
    UInt32 numTerms;

        /* set up locals, check parameters */
    targetTerms = NULL;
    if (hitFileProc == NULL) return paramErr;
    if (session == NULL) return paramErr;

        /* count the number of hits in this session */
    err = FBCGetHitCount(session, &hitCount);
    if (err != noErr) goto bail;

        /* iterate through the hits */
    for (i = 0; i < hitCount; i++) {

            /* get the target document's FSSpec */
        err = FBCGetHitDocument(session, i, &targetDoc);
        if (err != noErr) goto bail;

            /* get the score for this document */
        err = FBCGetHitScore(session, i, &targetScore);
        if (err != noErr) goto bail;

            /* get a list of the words matched in
            this document */
        numTerms = maxHitTerms;
        err = FBCGetMatchedWords(session, i, &numTerms,
                                            &targetTerms);
        if (err != noErr) goto bail;

            /* call the call back routine provided as a
            parameter to do something with the information. */
        err = hitFileProc(&targetDoc, score, numTerms,
                                            targetTerms);
        if (err != noErr) goto bail;

            /* clean up before moving to the next iteration. */
        FBCDestroyWordList(targetTerms, numTerms);
        targetTerms = NULL;

    }

    return noErr;

bail:
    if (targetTerms != NULL)
        FBCDestroyWordList(targetTerms, numTerms);
    return err;
}

Listing 10. Enumerating all of the files found in a search session.

Back to top


Find By Content Reference

This section provides a description of the CFM-based interfaces to the PowerPC FBC library. PowerPC applications using these routines link against the library named "Find By Content" (without the quotes).

Back to top


Data Types

FBC provides the following data types. Storage management for these types is provided by the FBC library. Clients should not attempt to allocate or deallocate these structures using calls to the Memory Manager.

FBCSearchSession



typedef struct OpaqueFBCSearchSession* FBCSearchSession;


Search sessions created by FBC are referenced through pointer variables of this type. The internal format of the data referred to by this pointer is internal to the FBC library. Clients should not attempt to access or modify this data directly.

FBCWordItem



typedef char* FBCWordItem;


An ordinary C string. This type is used when retrieving information about hits from a search session.

FBCWordList



typedef FBCWordItem* FBCWordList;


An array of WordItems. This type is used when retrieving information about hits from a search session.

Back to top


Allocation and Initialization of Search Sessions

The following routines can be used to allocate and dispose of search sessions. Storage occupied by search sessions is owned by the FBC library, and these are the only routines that should be used to allocate, copy, and dispose of search sessions.

FBCCreateSearchSession



OSErr FBCCreateSearchSession(
             FBCSearchSession* searchSession);


searchSession points to a variable of type FBCSearchSession.

FBCCreateSearchSession allocates a new search session and returns a reference to it in the variable pointed to by searchSession.

FBCDestroySearchSession



OSErr FBCDestroySearchSession(
             FBCSearchSession theSession);


theSession is a pointer to a search session.

FBCDestroySearchSession reclaims the storage occupied by a search session. This will include any volume configuration information and hits associated with the search session.

FBCCloneSearchSession



OSErr FBCCloneSearchSession(
             FBCSearchSession original,
             FBCSearchSession* clone);


original is a pointer to a search session.

clone points to a variable of type FBCSearchSession.

FBCCloneSearchSession creates a new search session and stores a pointer to it in the variable pointed to by the clone parameter. Information from the original search session that is copied to the new session includes the volumes targeted by the search session and all of the hits that may have been found in previous searches.

Back to top


Configuring Search Sessions

Search sessions can be configured to limit searches to a particular set of volumes. These routines allow clients access to the set of volumes that will be searched by FBC.

FBCAddAllVolumesToSession



OSErr FBCAddAllVolumesToSession(
             FBCSearchSession theSession,
             Boolean includeRemote);


theSession is a pointer to a search session.

includeRemote is a Boolean value.

FBCAddAllVolumesToSession configures a search session to search all mounted volumes that have been indexed. If includeRemote is true, then remote volumes will be included in the search session's list of target volumes. Volumes that are not indexed are not added to search session's list of target volumes.

FBCSetSessionVolumes



OSErr FBCSetSessionVolumes(
             FBCSearchSession theSession,
             const SInt16 *vRefNums,
             UInt16 numVolumes);


theSession is a pointer to a search session.

vRefNums is an pointer to an array of volume reference numbers (16-bit integers).

numVolumes is an integer value containing the number of volume reference numbers in the array vRefNums.

FBCSetSessionVolumes allows clients to add several volumes to the list of volumes targeted by a search session in a single call.


FBCAddVolumeToSession



OSErr FBCAddVolumeToSession(
             FBCSearchSession theSession,
             SInt16 vRefNum);


theSession is a pointer to a search session.

vRefNum is a volume reference number.

FBCAddVolumeToSession adds a volume to the list of volumes that will be searched by the search session. If the volume is not indexed, it will not be added to the list.

FBCRemoveVolumeFromSession



OSErr FBCRemoveVolumeFromSession(
             FBCSearchSession theSession,
             SInt16 vRefNum);


theSession is a pointer to a search session.

vRefNum is a volume reference number.

FBCRemoveVolumeFromSession removes the specified volume from the list of volumes that will be searched by the search session.

FBCGetSessionVolumeCount



OSErr FBCGetSessionVolumeCount(
             FBCSearchSession theSession,
             UInt16* count);


theSession is a pointer to a search session.

count is a pointer to a 16-bit integer where the result is to be stored.

FBCGetSessionVolumeCount returns, in *count, the number of volumes in the list of volumes that will be searched by the search session.

FBCGetSessionVolumes



OSErr FBCGetSessionVolumes(
             FBCSearchSession theSession,
             SInt16 *vRefNums,
             UInt16* numVolumes);


theSession is a pointer to a search session.

vRefNums is a pointer to an array of volume reference numbers (16-bit integers).

*numVolumes is a pointer to a 16-bit integer. On input, this will be the number of elements that can be stored in vRefNums, and on output it will be the number of elements actually stored in vRefNums.

FBCGetSessionVolumes returns the list of volumes that will be searched by the search session in the array pointed to by vRefNums. *numVolumes is set to the number of volume reference numbers returned in the array.

Back to top



Executing a Search

FBC provides three different routines for conducting searches that are described in this section.


FBCGetSessionVolumeCount



OSErr FBCDoQuerySearch(
             FBCSearchSession theSession,
             char* queryText,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords);


theSession is a pointer to a search session.

queryText refers to a C-style string containing the search terms.

targetDirs points to an array of FSSpec records that refer to directories. If numTargets is zero, then this parameter can be set to NULL.

numTargets contains the number FSSpec records in the array pointed to by targetDirs.

maxHits the maximum number of hits that should be returned.

maxHitWords the maximum number of hit words that will be reported.

FBCDoQuerySearch performs a search based on the search terms found in queryText. If the targetDirs parameter is present (numTargets is not zero), then only files residing in the directories specified in targetDirs will be included in the hits found by the search.

FBCDoExampleSearch



OSErr FBCDoExampleSearch(
             FBCSearchSession theSession,
             const UInt32* exampleHitNums,
             UInt32 numExamples,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords);


theSession contains a pointer to a search session. This session must contain a hit list generated by a previous search.

exampleHitNums points to an array of 32 bit integers.

numExamples contains the number of integers in the array pointed to by exampleHitNums.

targetDirs points to an array of FSSpec records that refer to directories. If numTargets is zero, then this parameter can be set to NULL.

numTargets contains the number FSSpec records in the array pointed to by targetDirs.

maxHits the maximum number of hits that should be returned.

maxHitWords the maximum number of hit words that will be reported.

FBCDoExampleSearch performs an example-based or "similarity" search using hits found in a previous search as examples. exampleHitNums points to an array of long integers containing the indexes of one or more of the hits that are to be used as example files. If the targetDirs parameter is present (numTargets is not zero), then only files residing in the directories specified in targetDirs will be included in the hits found by the search.

FBCBlindExampleSearch



OSErr FBCBlindExampleSearch(
             FSSpec examples[ ],
             UInt32 numExamples,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords,
             Boolean allIndexes,
             Boolean includeRemote,
             FBCSearchSession* theSession);


examples is a pointer to an array of FSSpec records that refer to files. FBC will search for files that are similar to these files.

numExamples contains the number of FSSpec records in the array pointed to by examples.

targetDirs points to an array of FSSpec records referring to directories. If targetDirs is not NULL and numTargets is not zero, then only files residing in these directories will be included in the hit list returned by the search.

targetDirs points to an array of FSSpec records that refer to directories. If numTargets is zero, then this parameter can be set to NULL.

numTargets contains the number FSSpec records in the array pointed to by targetDirs.

maxHits the maximum number of hits that should be returned.

maxHitWords the maximum number of hit words that will be reported.

includeRemote is a Boolean value.

theSession points to a variable of type FBCSearchSession that will be created by this routine.

FBCBlindExampleSearch creates a new search session and conducts a similarity search using the files referred to in the array of FSSpec records provided in the examples parameter. If the targetDirs parameter is present (numTargets is not zero), then only files residing in the directories specified in targetDirs will be included in the hits found by the search. If includeRemote is true, then remote volumes will be included in the search session's list of target volumes.

If any of the example files are not indexed, then the search will proceed with the remainder of the files, and the error code kFBCsomeFilesNotIndexed will be returned. In this case, the search session will be created and a reference to it will be returned in *theSession.

Back to top


Getting Information About Hits

Once a search is complete, a search session will contain a list of hits that were found during the search. The routines described in this section allow clients to access information about hits stored in a search session. Hit records are indexed 0 through count-1.

FBCGetHitCount



OSErr FBCGetHitCount(
             FBCSearchSession theSession,
             UInt32* count);


theSession is a pointer to a search session.

count is a pointer to a 32-bit integer.

FBCGetHitCount sets the variable pointed to by count to the number of hits in the search session. Hit records are indexed 0 through count-1.

FBCGetHitDocument



OSErr FBCGetHitDocument(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             FSSpec* theDocument);


theSession is a pointer to a search session.

hitNumber is an index value referring to a hit record in the search session.

theDocument is a pointer to a FSSpec record.

FBCGetHitDocument returns the FSSpec record for the hit in the search session whose index is hitNumber.


FBCGetHitScore



OSErr FBCGetHitScore(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             float* score);


theSession is a pointer to a search session.

hitNumber is an index value referring to a hit record in the search session.

score is a pointer to a variable of type float.

FBCGetHitScore returns relevance score assigned to the hit in the search session whose index is hitNumber. The score is a direct measure of the document's relevance to the search criteria in the context of this particular search. Scores are normalized to the range 0.0 - 1.0, and the most relevant hit from every search always has a score of 1.0.


FBCGetMatchedWords



OSErr FBCGetMatchedWords(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             UInt32* wordCount,
             FBCWordList* list);



theSession is a pointer to a search session.

hitNumber is an index value referring to a hit record in the search session.

wordCount is a pointer to a 32-bit integer.

list is a pointer to a variable of type FBCWordList.

FBCGetMatchedWords returns a list of matched words for the hit in the search session whose index is hitNumber. This list of words illustrates why the hit was returned. On return, *list will contain a pointer to a word list structure and *wordCount will be set to the number of entries in that structure. Be sure to call FBCDestroyWordList to dispose of the structure when you are done with it.

The matched words for a hit are stored in the hit itself, so retrieving them is fast.

FBCGetTopicWords



OSErr FBCGetTopicWords(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             UInt32* wordCount,
             FBCWordList* list);


theSession is a pointer to a search session.

hitNumber is an index value referring to a hit record in the search session.

wordCount is a pointer to a 32-bit integer.

list is a pointer to a variable of type FBCWordList.

FBCGetTopicWords returns a list of topical words for the hit in the search session whose index is hitNumber. This list of words provides a clue about "what the document is about." On return, *list will contain a pointer to a word list structure and *wordCount will be set to the number of entries in that structure. Be sure to call FBCDestroyWordList to dispose of the structure when you are done with it.

The list of topical words for a particular hit must be generated through the index file, so this call is significantly slower than FBCGetMatchedWords.

FBCDestroyWordList



OSErr FBCDestroyWordList(
             FBCWordList theList,
             UInt32 wordCount);


theList is a pointer to a word list.

wordCount is the number of words in the list.

FBCDestroyWordList disposes of a word list allocated by either FBCGetMatchedWords or FBCGetTopicWords.

FBCReleaseSessionHits



OSErr FBCReleaseSessionHits(
             FBCSearchSession theSession);


theSession is a pointer to a search session. This session may contain hits generated by a search.

FBCReleaseSessionHits deallocates any information stored regarding hits from the last search from the search session. Volume configuration information is retained and once this call completes, the search session is ready to perform another search.

Back to top


Summarizing Text

This call produces a summary containing the "most relevant" sentences found in the input text.

FBCSummarize



OSErr FBCSummarize(
             void* inBuf,
             UInt32 inLength,
             void* outBuf,
             UInt32* outLength,
             UInt32* numSentences);


inBuf points to the text to be summarized.

inLength is the length of the text pointed to by inBuf.

outBuf points to a buffer where the summary should be stored.

outLength is a pointer to a 32-bit integer. On input, this value is set to the size of the buffer pointed to by outBuf. On output, it is set to the actual length of the data stored in the buffer pointed to by outBuf.

numSentences is a pointer to a 32-bit integer. On input, this value is the maximum number of sentences desired in the summary. On output, it is set to the actual number of sentences generated. If numSentences is 0 on input, FBC takes the number of sentences in the input buffer and divides by 10. If the result is 0, then the value 1 is used as the maximum; otherwise, if the result is greater than 10, then the value 10 is used as the maximum.

Back to top

Getting Information About Volumes

FBC provides the following utility routines for accessing information about volumes.

FBCVolumeIsIndexed



Boolean FBCVolumeIsIndexed (SInt16 theVRefNum);


theVRefNum is a volume reference number.

FBCVolumeIsIndexed returns true if the indicated volume has been indexed.

FBCVolumeIsRemote



Boolean FBCVolumeIsRemote(SInt16 theVRefNum);


theVRefNum is a volume reference number.

FBCVolumeIsRemote returns true if the indicated volume is located on a remote server. Clients may want to exclude networked volumes from searches to avoid network delays.

FBCVolumeIndexTimeStamp



OSErr FBCVolumeIndexTimeStamp(SInt16 theVRefNum,
             UInt32* timeStamp);


theVRefNum is a volume reference number.
timeStamp is a pointer to an unsigned 32 bit integer.

FBCVolumeIndexTimeStamp will return the time when the volume's index was last updated. The value returned in timeStamp is the same format as values returned by GetDateTime.

FBCVolumeIndexPhysicalSize



OSErr FBCVolumeIndexPhysicalSize(SInt16 theVRefNum,
             UInt32* size);


theVRefNum is a volume reference number.
size is a pointer to an unsigned 32 bit integer.


FBCVolumeIndexPhysicalSize returns the size of the volume's index file in bytes.

Back to top

Indexing Volumes, Folders, and Files

A new API has been added to Find By Content allowing for the immediate indexing of new or altered files. The new routine is declared as follows:

FBCIndexItems



OSErr FBCIndexItems(
     FSSpecArrayPtr theItems,
     UInt32 itemCount);


theItems is a pointer to an array of file specification records referring to the files to be indexed.
itemCount is the number of items in the array of file specification records.

FBCIndexItems indexes (or re-indexes) the files referred to in the array of file specification records passed as a pointer in the first parameter. If the volume containing a file already has an index, the document is added or re-indexed; and, if the volume does not contain an index, a new index is created.

Normally you will call FBCIndexItems after saving a file (or updating a file) on a volume containing an index. This will allow users to keep their indexes up to date without any additional effort. For more information about how to determine if a volume contains an index, refer to the Sherlock technote.

COMPATIBILITY NOTE
The symbol FBCIndexItems is not exported from the original version of the "Find By Content" shared library. Applications wishing to use this routine should weak link to this symbol and then test for it's presence before attempting to call it. Techniques for doing this are described in Technote TN1083, "Weak-Linking to a CFM-based Shared Library."

Back to top

Reserving Heap Space

Clients of FBC can reserve space in their heap zone for their callback routine before conducting a search.

FBCSetHeapReservation



void FBCSetHeapReservation(UInt32 bytes);


bytes is an integer value containing the number of bytes that should be reserved.

FBCSetHeapReservation sets the number of bytes FBC should guarantee are available in the client application's heap whenever the client's call back routine is called during searches. If you do not explicitly reserve heap space by calling this routine, then 200K will be reserved for you.

Back to top

Application-Defined Routine

Clients can provide a routine that will be called periodically during searches. This routine will provide clients with both information about the status of a search, and opportunity to cancel a search before it is complete.

Call back routines are defined as follows:


FBCCallbackProcPtr



typedef Boolean (*FBCCallbackProcPtr)(
             UInt16 phase,
             float percentDone,
             void *data);


phase is a 16-bit integer containing one of the following constants indicating the current status of the search:



    enum {
        kFBCphSearching             = 6,
        kFBCphMakingAccessAccessor  = 7,
        kFBCphAccessWaiting         = 8,
        kFBCphSummarizing           = 9,
        kFBCphIdle                  = 10,
        kFBCphCanceling             = 11
    };


percentDone is a progress value in the range 0.0 - 1.0
data contains the same value provided to FBCSetCallback in the data parameter.

To avoid locking up the system while a search is in progress, the callback should either directly or indirectly call WaitNextEvent.

An ongoing search will be canceled if the call back function returns true.

FBCSetCallback



void FBCSetCallback(FBCCallbackProcPtr fn, void* data);


fn is a pointer to your call back function.
data is a value passed through to your call back function.

FBCSetCallback sets the call back function that will be called during searches. If a client does not define a call back function, then the default callback function is used. The default call back function calls WaitNextEvent and returns false.

Back to top

Find By Content C Summary

Constants

enum {
    gestaltFBCIndexingState      = 'fbci',
    gestaltFBCindexingSafe       = 0,
    gestaltFBCindexingCritical   = 1
};


enum {
    gestaltFBCVersion            = 'fbcv',
    gestaltFBCCurrentVersion     = 0x0011
};

enum { /* error codes */
    kFBCvTwinExceptionErr     = -30500,
                    /* miscellaneous error */
    kFBCnoIndexesFound        = -30501,
    kFBCallocFailed           = -30502,
                    /*probably low memory*/
    kFBCbadParam              = -30503,
    kFBCfileNotIndexed        = -30504,
    kFBCbadIndexFile          = -30505,
                    /*bad FSSpec, or bad data in file*/
    kFBCtokenizationFailed    = -30512,
                    /*couldn't read from document or query*/
    kFBCindexNotFound         = -30518,
    kFBCnoSearchSession       = -30519,
    kFBCaccessCanceled        = -30521,
    kFBCindexNotAvailable     = -30523,
    kFBCsearchFailed          = -30524,
    kFBCsomeFilesNotIndexed   = -30525,
    kFBCillegalSessionChange  = -30526,
                    /*tried to add/remove vols */
                    /*to a session  that has hits*/
    kFBCanalysisNotAvailable  = -30527,
    kFBCbadIndexFileVersion   = -30528,
    kFBCsummarizationCanceled = -30529,
    kFBCbadSearchSession      = -30531,
    kFBCnoSuchHit             = -30532
};

enum { /* codes sent to the callback routine */
    kFBCphSearching             = 6,
    kFBCphMakingAccessAccessor  = 7,
    kFBCphAccessWaiting         = 8,
    kFBCphSummarizing           = 9,
    kFBCphIdle                  = 10,
    kFBCphCanceling             = 11
};


Data Types

    /* A collection of state information for searching*/
typedef struct OpaqueFBCSearchSession* FBCSearchSession;

    /* An ordinary C string (used for hit/doc terms)*/
typedef char* FBCWordItem;

    /* An array of WordItems*/
typedef FBCWordItem* FBCWordList;


Allocation and Initialization of Search Sessions

OSErr FBCCreateSearchSession(
             FBCSearchSession* searchSession);
OSErr FBCDestroySearchSession(
             FBCSearchSession theSession);
OSErr FBCCloneSearchSession(
             FBCSearchSession original,
             FBCSearchSession* clone);


Configuring Search Sessions

OSErr FBCAddAllVolumesToSession(
             FBCSearchSession theSession,
             Boolean includeRemote);
OSErr FBCSetSessionVolumes(
             FBCSearchSession theSession,
             const SInt16 vRefNums[ ],
             UInt16 numVolumes);
OSErr FBCAddVolumeToSession(
             FBCSearchSession theSession,
             SInt16 vRefNum);
OSErr FBCRemoveVolumeFromSession(
             FBCSearchSession theSession,
             SInt16 vRefNum);
OSErr FBCGetSessionVolumeCount(
             FBCSearchSession theSession,
             UInt16* count);
OSErr FBCGetSessionVolumes(
             FBCSearchSession theSession,
             SInt16 vRefNums[ ],
             UInt16* numVolumes);


Executing a Search

OSErr FBCDoQuerySearch(
             FBCSearchSession theSession,
             char* queryText,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords);
OSErr FBCDoExampleSearch(
             FBCSearchSession theSession,
             const UInt32* exampleHitNums,
             UInt32 numExamples,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords);
OSErr FBCBlindExampleSearch(
             FSSpec examples[ ],
             UInt32 numExamples,
             const FSSpec targetDirs[ ],
             UInt32 numTargets,
             UInt32 maxHits,
             UInt32 maxHitWords,
             Boolean allIndexes,
             Boolean includeRemote,
             FBCSearchSession* theSession);


Getting Information About Hits

OSErr FBCGetHitCount(
             FBCSearchSession theSession,
             UInt32* count);
OSErr FBCGetHitDocument(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             FSSpec* theDocument);
OSErr FBCGetHitScore(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             float* score);
OSErr FBCGetMatchedWords(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             UInt32* wordCount,
             FBCWordList* list);
OSErr FBCGetTopicWords(
             FBCSearchSession theSession,
             UInt32 hitNumber,
             UInt32* wordCount,
             FBCWordList* list);
OSErr FBCDestroyWordList(
             FBCWordList theList,
             UInt32 wordCount);
OSErr FBCReleaseSessionHits(
             FBCSearchSession theSession);


Summarizing Text

OSErr FBCSummarize(
             void* inBuf,
             UInt32 inLength,
             void* outBuf,
             UInt32* outLength,
             UInt32* numSentences);


Getting Information About Volumes

Boolean FBCVolumeIsIndexed (SInt16 theVRefNum);
Boolean FBCVolumeIsRemote(SInt16 theVRefNum);
OSErr FBCVolumeIndexTimeStamp(SInt16 theVRefNum,
             UInt32* timeStamp);
OSErr FBCVolumeIndexPhysicalSize(SInt16 theVRefNum,
             UInt32* size);


Indexing files, folders, and volumes

OSErr FBCIndexItems(
             FSSpecArrayPtr theItems,
             UInt32 itemCount);


Reserving Heap Space

void FBCSetHeapReservation(UInt32 bytes);



Application-Defined Routine

typedef Boolean (*FBCCallbackProcPtr)(
             UInt16 phase,
             float percentDone,
             void *data);
void FBCSetCallback(FBCCallbackProcPtr fn, void* data);


Back to top

References

Technote TN1141, "Extending and Controlling Sherlock"

Technote TN1181, "Sherlock's Find by Content Text Extractor Plug-ins."

Back to top

Downloadables

Acrobat

Acrobat version of this Note (96K).

Download



Back to top