|
|
Log In | Not a Member? |
Contact ADC |
|
ADC Home > Reference Library > Reference > Text & Fonts > Text Manipulation > Latent Semantic Mapping Reference
|
LatentSemanticMapping.h |
| Include Path: | <LatentSemanticMapping/LatentSemanticMapping.h> |
| Path: | /System/Library/Frameworks/LatentSemanticMapping.framework/Versions/A/Headers/LatentSemanticMapping.h |
| Includes: | <CoreFoundation/CoreFoundation.h> <CoreServices/CoreServices.h> <Carbon/Carbon.h> <stdio.h> <stdint.h> |
This header contains the Latent Semantic Mapping API, which supports the classification of text and other token-based content, based on latent semantic information. For example, you can use the functions of the LSM API to classify an email message as consistent or inconsistent with a user's interests.
The LSM API defines the following entities:
LSMMapAddCategory |
Adds a category to the specified map.
LSMCategory LSMMapAddCategory( LSMMapRef mapref);
maprefThe category identifier of the category added to the map. See Error_Codes for error codes that might be returned.
LSMMapAddText |
Adds a training text to the specified category.
OSStatus LSMMapAddText( LSMMapRef mapref, LSMTextRef textref, LSMCategory category);
mapreftextrefcategoryNoErr if successful. See Error_Codes for error codes that might be returned.
The textref parameter is not needed after this call, and can be released.
LSMMapAddTextWithWeight |
Adds a training text to the given category, with a weight other than 1.
OSStatus LSMMapAddTextWithWeight( LSMMapRef mapref, LSMTextRef textref, LSMCategory category, float weight);
mapreftextrefcategoryweightNoErr if successful. See Error_Codes for error codes that might be returned.
Although the value of the weight parameter may be negative, no word will receive a total count less than 0. The textref parameter is not needed after this call, and can be released.
LSMMapApplyClusters |
Groups categories, words, or tokens into the specified sets of clusters.
OSStatus LSMMapApplyClusters( LSMMapRef mapref, CFArrayRef clusters);
maprefclustersNoErr if successful. See Error_Codes for error codes that might be returned.
LSMMapCompile |
Compiles the map into executable form.
OSStatus LSMMapCompile( LSMMapRef mapref);
maprefNoErr if successful. See Error_Codes for error codes that might be returned.
This function puts the map into mapping mode and prepares it for the classification of texts and other token-based content. Note that this function is computationally expensive.
LSMMapCreate |
Creates a new LSM map.
LSMMapRef LSMMapCreate( CFAllocatorRef alloc, CFOptionFlags flags);
allocflagsAn LSM map. See Error_Codes for error codes that might be returned.
Call CFRelease to dispose of the map.
LSMMapCreateClusters |
Computes a set of clusters, grouping similar categories, words, or tokens.
CFArrayRef LSMMapCreateClusters( CFAllocatorRef alloc, LSMMapRef mapref, CFArrayRef subset, CFIndex numClusters, CFOptionFlags flags);
allocmaprefsubsetnumClustersflagsAn array containing the specified number of clusters.
If subset is non-NULL, only perform clustering on the categories, words, or tokens listed.
LSMMapCreateFromURL |
Loads a map from the specified file.
LSMMapRef LSMMapCreateFromURL( CFAllocatorRef alloc, CFURLRef file, CFOptionFlags flags);
allocfileflagskLSMMapDiscardCounts (defined in Storage_Flags). Note that if you pass this flag, the map will need to be reloaded with the kLSMMapLoadMutable option instead before calling LSMMapStartTraining.
kLSMMapLoadMutable (defined in Storage_Flags)
The LSM map loaded from the specified file. See Error_Codes for error codes that might be returned.
LSMMapGetCategoryCount |
Returns the number of categories in the specified map.
CFIndex LSMMapGetCategoryCount( LSMMapRef mapref);
maprefThe number of categories in the map. See Error_Codes for error codes that might be returned.
LSMMapGetProperties |
Gets the dictionary of properties for the map.
CFDictionaryRef LSMMapGetProperties( LSMMapRef mapref);
maprefA CFDictionary of properties associated with the map. See Error_Codes for error codes that might be returned.
Because LSM retains ownership of the dictionary this function returns, do not release it. See LSM_Map_Properties for information on these properties.
LSMMapGetTypeID |
Returns the Core Foundation type identifier for LSM maps.
CFTypeID LSMMapGetTypeID( void);
LSMMapSetProperties |
Sets a dictionary of properties for the map.
void LSMMapSetProperties( LSMMapRef mapref, CFDictionaryRef properties);
maprefpropertiesBecause LSM makes its own copy of these properties, there's no need to retain them past this call.
LSMMapSetStopWords |
Specifies words to be omitted from all classification efforts.
OSStatus LSMMapSetStopWords( LSMMapRef mapref, LSMTextRef textref);
mapreftextrefNoErr if successful. See Error_Codes for error codes that might be returned.
If you use this function, you must call it before any calls to LSMMapAddText. The textref parameter is not needed after this call, and can be released.
LSMMapStartTraining |
Puts the map into training mode, preparing it for the addition of more categories or texts or both.
OSStatus LSMMapStartTraining( LSMMapRef mapref);
maprefNoErr if successful. See Error_Codes for error codes that might be returned.
This function is somewhat expensive, because it requires substantial data structure reorganization.
LSMMapWriteToStream |
Writes information about a map or a text or both to a byte stream in text form.
OSStatus LSMMapWriteToStream( LSMMapRef mapref, LSMTextRef textref, CFWriteStreamRef stream, CFOptionFlags options);
mapreftextrefstreamoptionsNoErr if successful. See Error_Codes for error codes that might be returned.
LSMMapWriteToURL |
Compiles the map if necessary and stores it into the specified file.
OSStatus LSMMapWriteToURL( LSMMapRef mapref, CFURLRef file, CFOptionFlags flags);
mapreffileflagskLSMMapDiscardCounts (defined in Storage_Flags). Note that if you pass this flag you save disk space, but the map as stored can't be retrained (you can't call LSMMapStartTraining on it).
kLSMMapHashText (defined in Map_Flags). Note that if you pass this flag, the map will be hashed if it hasn't been hashed yet.
NoErr if successful. See Error_Codes for error codes that might be returned.
LSMResultCopyToken |
Returns the token for the n-th best (zero-based) result.
CFDataRef LSMResultCopyToken( LSMResultRef mapref, CFIndex n);
maprefnThe token for the n-th best result. See Error_Codes for error codes that might be returned.
LSMResultCopyTokenCluster |
Returns the cluster of tokens for the n-th best (zero-based) result.
CFArrayRef LSMResultCopyTokenCluster( LSMResultRef mapref, CFIndex n);
maprefnAn array containing the cluster of tokens for the n-th best result. See Error_Codes for error codes that might be returned.
LSMResultCopyWord |
Returns the word for the n-th best (zero-based) result.
CFStringRef LSMResultCopyWord( LSMResultRef result, CFIndex n);
resultnThe word for the n-th best result. See Error_Codes for error codes that might be returned.
LSMResultCopyWordCluster |
Returns the cluster of words for the n-th best (zero-based) result.
CFArrayRef LSMResultCopyWordCluster( LSMResultRef result, CFIndex n);
resultnAn array containing the cluster of words for the n-th best result. See Error_Codes for error codes that might be returned.
LSMResultCreate |
Returns, in decreasing order of likelihood, the categories or words that best match when a text is mapped into a map.
LSMResultRef LSMResultCreate( CFAllocatorRef alloc, LSMMapRef mapref, LSMTextRef textref, CFIndex numResults, CFOptionFlags flags);
allocmapreftextrefnumResultsflagskLSMResultBestWords, defined in Result_Flags to find words).An LSMResultRef value. See Error_Codes for error codes that might be returned.
This function categorizes the input text and returns an LSMResultRef value that represents up to numResults categories (or, optionally, numResults words) that best match, in decreasing order.
LSMResultGetCategory |
Returns the category of the n-th best (zero-based) result.
LSMCategory LSMResultGetCategory( LSMResultRef result, CFIndex n);
resultnThe LSM category of the n-th best (zero-based) result. See Error_Codes for error codes that might be returned.
LSMResultGetCount |
Returns the number of results associated with the specified result.
CFIndex LSMResultGetCount( LSMResultRef result);
resultThe number of LSM results actually created. See Error_Codes for error codes that might be returned.
LSMResultGetScore |
Returns the likelihood of the n-th best (zero-based) result.
float LSMResultGetScore( LSMResultRef result, CFIndex n);
resultnThe likelihood of the n-th best result, as a floating point value. See Error_Codes for error codes that might be returned.
LSMResultGetTypeID |
Returns the Core Foundation type identifier for LSM results.
CFTypeID LSMResultGetTypeID( void);
LSMTextAddToken |
Adds an arbitrary binary token to the text.
OSStatus LSMTextAddToken( LSMTextRef textref, CFDataRef token);
textreftokenNoErr if successful. See Error_Codes for error codes that might be returned.
The order of tokens is significant if the map uses pairs or triplets. The count of tokens is always significant.
LSMTextAddWord |
Adds a word to the text.
OSStatus LSMTextAddWord( LSMTextRef textref, CFStringRef word);
textrefwordNoErr if successful. See Error_Codes for error codes that might be returned.
The order of words is significant if the map uses pairs or triplets. The count of words is always significant.
LSMTextAddWords |
Breaks a string into words using the specified locale, and adds the words to the text.
OSStatus LSMTextAddWords( LSMTextRef textref, CFStringRef words, CFLocaleRef locale, CFOptionFlags flags);
textrefwordslocalewords string. Pass NULL to get the default locale.flagswords should be mapped. See Parsing_Flags for available options.NoErr if successful. See Error_Codes for error codes that might be returned.
LSMTextCreate |
Creates a new text.
LSMTextRef LSMTextCreate( CFAllocatorRef alloc, LSMMapRef mapref);
allocmaprefThe text created from the specified map. See Error_Codes for error codes that might be returned.
LSMTextGetTypeID |
Returns the Core Foundation type identifier for LSM texts.
CFTypeID LSMTextGetTypeID( void);
LSMCategory |
typedef uint32_t LSMCategory;
An integral type representing a category.
LSMMapRef |
typedef struct __LSMMap * LSMMapRef;
An opaque Core Foundation type representing an LSM map (mutable).
LSMResultRef |
typedef struct __LSMResult * LSMResultRef;
An opaque Core Foundation type representing the result of a lookup (immutable).
LSMTextRef |
typedef struct __LSMText * LSMTextRef;
An opaque Core Foundation type representing an input text (mutable).
Clustering_Flags |
Options for LSMMapCreateClusters.
enum { kLSMClusterCategories = 0, kLSMClusterWords = 1, kLSMClusterTokens = 2, kLSMClusterKMeans = 0, kLSMClusterAgglomerative = 4 };
kLSMClusterCategories- Cluster categories.
kLSMClusterWords- Cluster words.
kLSMClusterTokens- Cluster binary tokens.
kLSMClusterKMeans- Cluster using k-Means algorithm.
kLSMClusterAgglomerative- Cluster using agglomerative algorithm.
The first 3 flags specify the type of cluster and the last 2 specify the algorithm to be used. In LSMMapCreateClusters, you should OR a cluster-type flag with an algorithm flag for the flags parameter.
Error_Codes |
Error codes that may be returned from LSM routines.
enum { kLSMMapOutOfState = -6640, kLSMMapNoSuchCategory = -6641, kLSMMapWriteError = -6642, kLSMMapBadPath = -6643, kLSMMapBadCluster = -6644 };
kLSMMapOutOfState- This call cannot be issued in this map state.
kLSMMapNoSuchCategory- Invalid category specified.
kLSMMapWriteError- An error occurred writing the map.
kLSMMapBadPath- The specified URL does not exist.
kLSMMapBadCluster- The specified clusters are invalid.
Map_Flags |
Options that can be specified for LSMMapCreate.
enum { kLSMMapPairs = 1, kLSMMapTriplets = 2, kLSMMapHashText = 256 };
kLSMMapPairs- Use pairs in addition to single words.
kLSMMapTriplets- Use triplets in addition to single words.
kLSMMapHashText- Transform the text so it's not trivially human-readable. Note that this prevents the map from being used to generate speech-recognition language models.
These options can improve mapping accuracy, at a potentially significant increase in memory usage.
Parsing_Flags |
Options you can specify for LSMTextAddWords.
enum { kLSMTextPreserveCase = 1, kLSMTextPreserveAcronyms = 2, kLSMTextApplySpamHeuristics = 4 };
kLSMTextPreserveAcronyms- Don't map words consisting of all uppercase characters to lowercase. By default, all uppercase characters in the input text are mapped to lowercase characters. (Note that if
kLSMTextPreserveCaseis specified, no mapping to lowercase characters is performed at all.)kLSMTextPreserveCase- Don't map any uppercase characters to lowercase.
kLSMTextApplySpamHeuristics- Parse the text with a heuristic algorithm that assumes that the text is junk mail designed to confuse naive word parsers.
Result_Flags |
enum { kLSMResultBestWords = 1, };
kLSMResultBestWords- Find the words, rather than categories, that best match.
Options for LSMResultCreate.
Storage_Flags |
enum { kLSMMapDiscardCounts = 1, kLSMMapLoadMutable = 2 };
kLSMMapDiscardCounts- Don't keep counts. This option can save a lot of memory and/or disk space. See the usage notes in LSMMapCreateFromURL and LSMMapWriteToURL.
kLSMMapLoadMutable- Load map as mutable in training form instead of executable form. You might choose to use this flag to save time and memory if you plan to retrain the map right after storage.
Storage and loading options for LSMMapCreateFromURL and LSMMapWriteToURL.
kLSMAlgorithmDense |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMAlgorithmDense CFSTR("LSMAlgorithmDense")
Perform an SVD on a dense map (in a dense map, most words occur in most categories).
kLSMAlgorithmKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMAlgorithmKey CFSTR("LSMAlgorithm")
The algorithm to be used.
kLSMAlgorithmSparse |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMAlgorithmSparse CFSTR("LSMAlgorithmSparse")
Perform an SVD on a sparse map (in a sparse map, most words occur in only a small subset of categories).
kLSMDimensionKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMDimensionKey CFSTR("LSMDimension")
The maximum number of dimensions to use, as a CFNumber value (this defaults to the number of categories). Often, a lower dimension is appropriate, especially for the sparse algorithm.
kLSMIterationsKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMIterationsKey CFSTR("LSMIterations")
The number of iterations to use in the sparse algorithm, as a CFNumber value (this defaults to a number based on the number of dimensions in the map and rarely needs to be changed).
kLSMPrecisionDouble |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMPrecisionDouble CFSTR("LSMPrecisionDouble")
Use double precision floating point when performing computations (default for sparse map).
kLSMPrecisionFloat |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMPrecisionFloat CFSTR("LSMPrecisionFloat")
Use single precision floating point when performing computations (default for a dense map).
kLSMPrecisionKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMPrecisionKey CFSTR("LSMPrecision")
The precision to be used when performing computations.
kLSMSweepAgeKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMSweepAgeKey CFSTR("LSMSweepAge")
The number of days between sweeping generations (this defaults to 7).
kLSMSweepCutoffKey |
Predefined keys and properties for LSM maps.
See Also:
- LSM_Map_Properties
#define kLSMSweepCutoffKey CFSTR("LSMSweepCutoff")
When used, a CFNumber value that causes the map to be scanned every kLSMSweepAgeKey days, removing each entry older than 3 times the value of kLSMSweepAgeKey, whose total weight in the map is less than kLSMSweepCutoffKey.
LSM_Map_Properties |
Predefined keys and properties for LSM maps.
See Also:
- kLSMAlgorithmKey
- kLSMAlgorithmDense
- kLSMAlgorithmSparse
- kLSMPrecisionKey
- kLSMPrecisionFloat
- kLSMPrecisionDouble
- kLSMDimensionKey
- kLSMIterationsKey
- kLSMSweepAgeKey
- kLSMSweepCutoffKey
#define kLSMAlgorithmKey CFSTR( "LSMAlgorithm") //! The algorithm to be used.
#define kLSMAlgorithmDense CFSTR( "LSMAlgorithmDense") //! Perform an SVD on a dense map ( in a dense map, most words occur in most categories).
#define kLSMAlgorithmSparse CFSTR( "LSMAlgorithmSparse") //! Perform an SVD on a sparse map ( in a sparse map, most words occur in only a small subset of categories).
#define kLSMPrecisionKey CFSTR("LSMPrecision") //! The precision to be used when performing
computations.
#define kLSMPrecisionFloat CFSTR("LSMPrecisionFloat") //! Use single precision floating point
when performing computations ( default for a dense map).
#define kLSMPrecisionDouble CFSTR("LSMPrecisionDouble") //! Use double precision floating
point when performing computations ( default for sparse map).
#define kLSMDimensionKey CFSTR( "LSMDimension") //! The maximum number of dimensions to use, as a CFNumber value ( this defaults to the number of categories). Often, a lower dimension is appropriate, especially for the sparse algorithm.
#define kLSMIterationsKey CFSTR("LSMIterations") //! The number of iterations to use in the
sparse algorithm, as a CFNumber value (this defaults to a number based on the number of dimensions
in the map and rarely needs to be changed).
#define kLSMSweepAgeKey CFSTR("LSMSweepAge") //! The number of days between sweeping
generations ( this defaults to 7).
#define kLSMSweepCutoffKey CFSTR( "LSMSweepCutoff") //! When used,a CFNumber value that causes the map to be scanned every
kLSMSweepAgeKey days,removing each entry older than 3 times the value of
kLSMSweepAgeKey,whose total weight in the map is less than
kLSMSweepCutoffKey.
A CFDictionary of properties may be associated with an LSM map. These properties specify map settings, such as which algorithm should be used, with what precision should the computations be performed, and how many dimensions to use.
In addition, this API defines two keys that allow you to prune from a map those entries that are seen only infrequently. As you add new data to a map, the map sometimes will have a large number of entries that were encountered only a small number of times. To remedy this, you can specify a weight value for the kLSMSweepCutoffKey. When this key-value pair is present in a map's dictionary, the map is scanned at the interval specified by kLSMSweepAgeKey (by default, every 7 days). In each scan, entries are removed if they are older than three times this interval (by default, 21 days) and their weight in the map is less than the value specified by kLSMSweepCutoffKey.
The following keys and properties currently are interpreted by LSM (all other keys starting with LSM are reserved).
|