Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Perl and Python Extensions

The regular expression dialect used in Perl, Python, and many other languages, are an extension of basic regular expressions. Some of the major differences include:

Character Class Shortcuts

Perl regular expressions add a number of additional character class shortcuts. Some of these are listed below:

These can be used anywhere on the left side of a regular expression, including within character classes.

Note: Word boundaries do not exist in basic regular expressions. These actually match the position between two characters rather than an actual character.

A word boundary occurs before the first character of a line (if it is a word character), at the end of the line (if it ends in a word character), and between any word character and non-word character that occur consecutively.

Non-Greedy Wildcard Matching

By default, repeat operators are greedy, matching as many times as possible before attempting to match the next part of the string. This will generally result in the longest possible string that matches the expression as a whole. In some cases, you may want the matching to stop at the shortest possible string that matches the entire expression.

To support this, Perl regular expressions (along with many other dialects) supports non-greedy wildcard matching. To convert a greedy wildcard to a non-greedy wildcard, you just add a question mark after it.

For example, consider the nursery rhyme "Mary had a little lamb, its fleece was white as snow, and everywhere that Mary went, the lamb was sure to go." Assume that you apply the following expression:

/Mary.*lamb/

That expression would match "Mary had a little lamb, its fleece was white as snow, and everywhere that Mary went, the lamb".

Suppose that instead, you want to find the shortest possible string beginning with Mary and ending with lamb. You might instead use the following expression:

/Mary.*?lamb/

That expression would match only the words "Mary had a little lamb".

Non-Capturing Parentheses

You may notice that the syntax for capture is identical to the syntax for grouping described in “Wildcards and Repetition Operators.” In most cases, this is not a problem. However, in some cases, you may wish to avoid capturing content if you are using parentheses merely as a grouping tool.

To turn off capturing for a given set of parentheses (or quoted parentheses), you should add a question mark followed by a colon after the open parenthesis.

Consider the following example:

# Expression (Perl and Similar ONLY): /Mary (?:had)* a little lamb\./
perl -e "while (\$line = <STDIN>) {
    \$line =~ s/Mary (?:had )*a little lamb\./Lovely day, isn't it?/;
    print \$line;
}" < poem.txt

This expression will match "Mary", followed by zero (0) or more instances of "had" followed by "a little lamb", followed by a literal period, and will replace the offending line with "Lovely day, isn't it?".

Note: Non-capturing parentheses are a Perl extension to regular expressions, and are not supported by most command-line tools.



< Previous PageNext Page > Hide TOC


Last updated: 2008-04-08




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice