Quartz provides functions that let you inspect the PDF document structure and the content stream. Inspecting the document structure lets you read the entries in the document catalog and the contents associated with each entry. By recursively traversing the catalog, you can inspect the entire document.
A PDF content stream is just what its name suggests—a sequential stream of data such as 'BT 12 /F71 Tf (draw this text) Tj . . . ' where PDF operators and their descriptors are mixed with the actual PDF content. Inspecting the content stream requires that you access it sequentially. The functions for parsing PDF content streams are available starting in Mac OS X v10.4.
This chapter shows how to examine the structure of a PDF document and parse the contents of a PDF document.
Inspecting PDF Document Structure
Parsing PDF Content
Last updated: 2007-12-11