Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

< Previous PageNext Page > Hide TOC

Working with Arrays in awk

The syntax for arrays in awk is very similar to that of arrays in C. Don’t let that fool you, though. Under the hood, they behave very differently.

Arrays in awk are associative. This means that each array element is stored as a key-value pair, resulting in three major differences when compared to C:

There are two ways to create an array. The first is by simply using it. The second is by using the split function. These methods are described in the sections that follow, along with useful tips about working with arrays.

In this section:

Array Basics
Creating Arrays with split
Copying and Joining an Array
Deleting Array Elements


Array Basics

The following code creates and prints an array called my_array containing the values “Partridge”, “tree”, “pear”, and “Cassidy”:

BEGIN {
        my_array[0] = "Partridge";
        my_array[1] = "pear";
        my_array[2] = "tree";
        my_array["David"] = "Cassidy";
 
        for ( my_index in my_array ) {
                print my_index "=" my_array[my_index];
        }
}

The first thing you will notice is that the array is not printed in order. In fact, it is printed in the order in which the underlying data is stored internally. If you want to print the values in key order, you must walk through the index numerically instead.

The second thing you will notice is that the for statement can be used to iterate through all of the keys in the array. In this usage, the for statement in awk is like the for statement in a shell script. The for statement array-iterator usage is:

for (key_variable in array_name) statement

Note: Unlike the for or foreach statements in most other languages, the array-iterator-style for statement in awk iterates through the array keys (indices) rather than through the array values. Thus, it is similar to the following Perl statement:

foreach my $key_variable (keys %assoc_array) { ... }
Because key_variable contains the key from each key-value pair rather than the value, you must explicitly use the key as an array index if you want to to obtain the values in the array. For example:

for ( i in arr ) {
        print arr[i];
}

The third thing you will notice is that, unlike C, array elements can take arbitrary strings as their key (array index). If you need to iterate through the array in key order, however, you should limit yourself to numeric keys.

As a side effect, the keys are always stored as a string even if they only contain numbers. Thus, if you want to compare them numerically to each other (for example, to find the smallest key for which a value exists), you must add zero (0) to the key prior to making the comparison.

For example, the following code iterates through this sparse array in key order by finding the minimum and maximum key values and then iterating from the minimum to the maximum:

BEGIN {
        my_array[0] = "Partridge";
        my_array[1] = "pear";
        my_array[2] = "tree";
        my_array[13] = "Cassidy";
 
        min = 0; max = 0;
        for ( my_index in my_array ) {
                if (my_index+0 < min) min = my_index;
                if (my_index+0 > max) max = my_index;
        }
        for (i=min; i<= max; i++) {
                if (i in my_array) {
                        print i "=" my_array[i];
                }
        }
}

In this example, you should note the if statement syntax near the end. Before printing an array value, the example checks to see if a value has ever been stored for that key value:

if (i in my_array) { ... }

Note: Generally speaking, awk assumes that you will do any array sorting externally (after awk has finished) using the sort tool or similar tools; for performance reasons, you should generally do so.

Creating Arrays with split

Assigning array elements individually can be very tedious. A more common (read “less painful”) way to create an array is with the split function. The split syntax is as follows:

count = split( string, array_name, regexp );

For example, the following code splits the string “Mary lamb freezer” into words separated by spaces.

BEGIN {
        arr_len = split( "Mary lamb freezer", my_array, / / );
}

The result is that arr_len contains the number three (3). The variable my_array[1] contains “Mary”, my_array[2] contains “lamb”, and so on.

Copying and Joining an Array

The awk language does not support assignment of arrays. Thus, to copy an array, you must copy the individual values from one array to the next. For example, the following code initializes my_array and then copies its contents to copy_array before printing the array:

BEGIN {
        arr_len = split( "Mary lamb freezer", my_array, / / );
        for (word in my_array) {
                copy_array[word] = my_array[word];
        }
        for (word in copy_array) {
                print copy_array[word];
        }
}

Similarly, the awk language does not provide functions to join an array. To join an array, you should write a simple function like this one:

function join(input_array, separator) {
        string = "";
        first = 1;
 
        # Note: the array items are in no particular
        # order when joined with this function.
        for (i in input_array) {
                if (first) first = 0;
                else string = string separator;
                string = string input_array[i];
        }
        return string;
}
BEGIN {
        arr_len = split( "foo bar baz", my_array, / /);
 
        for (word in my_array) {
                print my_array[word];
        }
 
        print join(my_array, " ");
}

Like all array functions written using the array-iterator form of the for statement, this join does not occur in any particular order. If you need to join the array values in a particular order, you must write your own custom join function either using a numeric iterator or a manually specified list of fields. For example:

function join(input_array, separator) {
        string = "";
        first = 1;
 
        # Note: this preserves order, but does not
        # work with non-numeric or sparse arrays.
        for (i=1; i<=length(input_array); i++) {
                if (first) first = 0;
                else string = string separator;
                string = string input_array[i];
        }
        return string;
}
BEGIN {
        arr_len = split( "foo bar baz", my_array, / /);
 
        for (word in my_array) {
                print my_array[word];
        }
 
        print join(my_array, " ");
}

Deleting Array Elements

As you saw in “Array Basics,” you can add values to an array using arbitrary keys. You can also check to see if a value exists for a given key using the if (key in array) syntax.

If you need to delete a key-value pair, you could assign an empty value. However, the if (key in array) syntax still evaluates to true because there is still a value for that key (albeit an empty value). Thus, you probably want to remove the key entirely.

The awk programming language solves this problem with the delete function. The syntax for delete is:

delete array_name[key];

For example, the following script prints only the key-value pairs “purple = Partridge” and “majesties = tree”.

BEGIN {
        my_array["purple"] = "Partridge";
        my_array["mountain"] = "pear";
        my_array["majesties"] = "tree";
        my_array["fruited"] = "Cassidy";
 
        mykey = "fruited";
        delete my_array["mountain"];
        delete my_array[mykey];
 
        for (i in my_array) {
                print i "=" my_array[i];
        }
}

If you need to clear all values from an array simultaneously, though, you don’t have to delete them one at a time. Instead, you can simply do the following:

delete array_name;

This statement leaves the array specified by array_name empty for future use. You might do this if, for example, you want an array to be reset for each record.



< Previous PageNext Page > Hide TOC


Last updated: 2008-04-08




Did this document help you?
Yes: Tell us what works for you.

It’s good, but: Report typos, inaccuracies, and so forth.

It wasn’t helpful: Tell us what would have helped.
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2007 Apple Inc.
All rights reserved. | Terms of use | Privacy Notice