Data processing with the Linux tools
Pure Awk
Chaining multiple commands together requires a considerable amount of typing. The work required can be minimized, however, by using just Awk. To do this, you should define three blocks with commands, each enclosed in curly brackets:
BEGIN { FS="\t+" } { total += $1 * $4 } END {printf "Total: %d miles\n", total}
In the first block, you specify the separator to be applied after the key word BEGIN
. This is done by setting the Awk variable FS
(file separator) to \t+
for a symbol sequence made up of one or more tabs.
A total
variable is defined in the second block. You should indicate to Awk that it should pick the first and fourth columns when reading each line and multiply the values together. The individual values are then added to the total
variable via the +=
assignment operator. If a column containing non-numeric text is read, it will result in a zero value. This Awk convention makes a special adjustment to the heading line unnecessary.
The key word END
sits in front of the third block and marks the end of the action for each line read. You then output the result with the help of the printf
function and the formatting indicator %d
(digit sequence, i.e., a numeric value). Awk replaces the formatting indicator in the output string with the computed value kept in the total
variable.
If you save the Awk script as a file (e.g., distance.awk
), you can use it again when needed; then you can start the computation and receive the results with:
$ awk -f distance.awk drivinglog.txt Total: 1740 miles
The -f
switch tells Awk that it should read the script from the indicated file and assumes that the data and Awk script are found in the current directory. If they are located elsewhere, you should provide the appropriate path.
Perl
The Perl [6] script in Listing 2 first checks in line 2 whether you have provided parameters to the script. If not, it terminates without comment. Line 3 defines the variable total
for the total amount and initializes this with a zero value.
Listing 2
Perl
01 #!/usr/bin/perl 02 exit unless @ARGV; 03 my $total = 0; 04 while(<>) { 05 my @fields = split(/\t+/); 06 $total += $fields[0] * $fields[3]; 07 } 08 print "total: $total miles\n"
The while
loop (lines 4-7) processes the input stream, which originates either from a filename hard-coded into the program or from STDIN (standard input), which is usually the keyboard. The script divides each input line (split()
) into individual fields by recognizing the tab character as the separator and saves the fields in the @fields
list (line 5). The split()
function uses the regular expression \t+
, which, as explained earlier, represents a symbol sequence of one or more tabs.
The distance for each trip can then be computed from the contents of the first and fourth fields (line 6). You should take into consideration that a Perl array indexes start at zero. In contrast, the index for the earlier solutions started at one. Line 6 adds the distance for each trip to the current total, and line 8 outputs the total distance driven.
You have four ways to invoke the script,
$ ./distance.pl drivinglog.txt Total: 1740 miles $ perl distance.pl drivinglog.txt Total: 1740 miles $ cat drivinglog.txt | perl distance.pl Total: 1740 miles $ cat drivinglog.txt | ./distance.pl Total: 1740 miles
which makes for flexible integration into further processing. If you leave line 3 out of the Perl script, the invocations that use cat
will not work, because line 3 checks for the presence of invocation parameters.
Buy this article as PDF
Pages: 6
(incl. VAT)