Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Inspecting PDF Document Structure

PDF files may contain multiple pages of images and text. You can use Quartz to access the metadata at the document and page levels as well as objects on a PDF page. This section provides a very brief introduction to the metadata you can access.

A PDF document object (CGPDFDocument) contains all the information that relates to a PDF document, including its catalog and contents. The entries in the catalog recursively describe the contents of the PDF document. You can access the contents of a PDF document catalog by calling the function CGPDFDocumentGetCatalog.

A PDF page object (CGPDFPage) represents a page in a PDF document and contains information that relates to a specific page, including the page dictionary and page contents. You can obtain a page dictionary by calling the function CGPDFPageGetDictionary.

Figure 14-1 shows some of the metadata for a PDF document displayed by the Voyeur sample application. After you install the Xcode Tools CD, you can find the Xcode project for this application in:

/Developer/Examples/Quartz/PDF/Voyeur

The metadata shown in the figure describes the two images—the text and the image of the rooster—that make up the PDF file displayed in Figure 13-2.


Figure 14-1  Metadata for two images in a PDF file

Metadata for two images in a PDF file

You can obtain much more useful information by accessing PDF metadata. The items in Figure 14-1 are just a sample. For example, you can check to see if a PDF has thumbnail images (shown in Figure 14-2) using the code shown in Listing 14-1.

Listing 14-1  Code that gets a thumbnail view of a PDF

CGPDFDictionaryRef d;
CGPDFStreamRef stream; // represents a sequence of bytes
d = CGPDFPageGetDictionary(page);
// check for thumbnail data
if (CGPDFDictionaryGetStream (d, "Thumb", &stream)){
    // get the data if it exists
    data = CGPDFStreamCopyData (stream, &format);

Quartz performs all the decryption and decoding of the data stream for you.


Figure 14-2  Thumbnail images

Thumbnail images

Quartz provides a number of functions that you can use to obtain individual values for items in the PDF metadata. You use the function CGPDFObjectGetValue, passing a CGPDFObjectRef, a PDF object type (kCGPDFObjectTypeBoolean, kCGPDFObjectTypeInteger, and so forth), and storage for the value. On return, the storage is filled with the value.

There are numerous other functions you can use to traverse the hierarchy of a PDF file to access the various nodes and their children. For example, the CGPDFArray functions (CGPDFArrayGetBoolean, CGPDFArrayGetDictionary, CGPDFArrayGetInteger, and so forth) let you access arrays of values to retrieve values of specific types. You can find out more about how to use these functions by looking at the Voyeur Xcode project and reading the PDF specification.



< Previous PageNext Page > Hide TOC


Last updated: 2007-12-11




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice