PDF files may contain multiple pages of images and text. You can use Quartz to access the metadata at the document and page levels as well as objects on a PDF page. This section provides a very brief introduction to the metadata you can access.
A PDF document object (CGPDFDocument) contains all the information that relates to a PDF document, including its catalog and contents. The entries in the catalog recursively describe the contents of the PDF document. You can access the contents of a PDF document catalog by calling the function CGPDFDocumentGetCatalog.
A PDF page object (CGPDFPage) represents a page in a PDF document and contains information that relates to a specific page, including the page dictionary and page contents. You can obtain a page dictionary by calling the function CGPDFPageGetDictionary.
Figure 14-1 shows some of the metadata for a PDF document displayed by the Voyeur sample application. After you install the Xcode Tools CD, you can find the Xcode project for this application in:
/Developer/Examples/Quartz/PDF/Voyeur
The metadata shown in the figure describes the two images—the text and the image of the rooster—that make up the PDF file displayed in Figure 13-2.
You can obtain much more useful information by accessing PDF metadata. The items in Figure 14-1 are just a sample. For example, you can check to see if a PDF has thumbnail images (shown in Figure 14-2) using the code shown in Listing 14-1.
Listing 14-1 Code that gets a thumbnail view of a PDF
CGPDFDictionaryRef d; |
CGPDFStreamRef stream; // represents a sequence of bytes |
d = CGPDFPageGetDictionary(page); |
// check for thumbnail data |
if (CGPDFDictionaryGetStream (d, "Thumb", &stream)){ |
// get the data if it exists |
data = CGPDFStreamCopyData (stream, &format); |
Quartz performs all the decryption and decoding of the data stream for you.
Quartz provides a number of functions that you can use to obtain individual values for items in the PDF metadata. You use the function CGPDFObjectGetValue, passing a CGPDFObjectRef, a PDF object type (kCGPDFObjectTypeBoolean, kCGPDFObjectTypeInteger, and so forth), and storage for the value. On return, the storage is filled with the value.
There are numerous other functions you can use to traverse the hierarchy of a PDF file to access the various nodes and their children. For example, the CGPDFArray functions (CGPDFArrayGetBoolean, CGPDFArrayGetDictionary, CGPDFArrayGetInteger, and so forth) let you access arrays of values to retrieve values of specific types. You can find out more about how to use these functions by looking at the Voyeur Xcode project and reading the PDF specification.
Last updated: 2007-12-11