Data processing with the Linux tools

Pure Awk

Chaining multiple commands together requires a considerable amount of typing. The work required can be minimized, however, by using just Awk. To do this, you should define three blocks with commands, each enclosed in curly brackets:

BEGIN { FS="\t+" }
{ total += $1 * $4 }
END {printf "Total: %d miles\n", total}

In the first block, you specify the separator to be applied after the key word BEGIN. This is done by setting the Awk variable FS (file separator) to \t+ for a symbol sequence made up of one or more tabs.

A total variable is defined in the second block. You should indicate to Awk that it should pick the first and fourth columns when reading each line and multiply the values together. The individual values are then added to the total variable via the += assignment operator. If a column containing non-numeric text is read, it will result in a zero value. This Awk convention makes a special adjustment to the heading line unnecessary.

The key word END sits in front of the third block and marks the end of the action for each line read. You then output the result with the help of the printf function and the formatting indicator %d (digit sequence, i.e., a numeric value). Awk replaces the formatting indicator in the output string with the computed value kept in the total variable.

If you save the Awk script as a file (e.g., distance.awk), you can use it again when needed; then you can start the computation and receive the results with:

$ awk -f distance.awk drivinglog.txt
Total: 1740 miles

The -f switch tells Awk that it should read the script from the indicated file and assumes that the data and Awk script are found in the current directory. If they are located elsewhere, you should provide the appropriate path.

Perl

The Perl [6] script in Listing 2 first checks in line 2 whether you have provided parameters to the script. If not, it terminates without comment. Line 3 defines the variable total for the total amount and initializes this with a zero value.

Listing 2

Perl

01 #!/usr/bin/perl
02 exit unless @ARGV;
03 my $total = 0;
04 while(<>) {
05  my @fields = split(/\t+/);
06  $total += $fields[0] * $fields[3];
07 }
08 print "total: $total miles\n"

The while loop (lines 4-7) processes the input stream, which originates either from a filename hard-coded into the program or from STDIN (standard input), which is usually the keyboard. The script divides each input line (split()) into individual fields by recognizing the tab character as the separator and saves the fields in the @fields list (line 5). The split() function uses the regular expression \t+, which, as explained earlier, represents a symbol sequence of one or more tabs.

The distance for each trip can then be computed from the contents of the first and fourth fields (line 6). You should take into consideration that a Perl array indexes start at zero. In contrast, the index for the earlier solutions started at one. Line 6 adds the distance for each trip to the current total, and line 8 outputs the total distance driven.

You have four ways to invoke the script,

$ ./distance.pl drivinglog.txt
Total: 1740 miles
$ perl distance.pl drivinglog.txt
Total: 1740 miles
$ cat drivinglog.txt | perl distance.pl
Total: 1740 miles
$ cat drivinglog.txt | ./distance.pl
Total: 1740 miles

which makes for flexible integration into further processing. If you leave line 3 out of the Perl script, the invocations that use cat will not work, because line 3 checks for the presence of invocation parameters.

Buy this article as PDF

Express-Checkout as PDF

Pages: 6

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Connecting a weather station to your Arduino

    After losing one weather station to tropical winds, the author reboots and designs a PCB that connects to an Arduino and monitors weather instruments.

  • Tracking airplanes in real time with ADS-B

    Airplanes continuously broadcast signals that identify the aircraft and its current flight path. With a moderately priced receiver and a Raspberry Pi, users can receive ADS-B transponder data in real time.

  • Graphical displays with Python and Pygame

    As its name implies, Pygame is a set of Python modules designed to write games. However, many Pygame modules are useful for any number of projects. We introduce you to a few Pygame modules that you can use to create custom graphical displays for your project.

  • A home intrusion detection setup (sort of)

    At least part of the popularity of the Raspberry Pi can be attributed to its high maker value; that is, a skilled maker with a Pi can build marvelous and beautiful things. Me? Not so much, but I was willing to try to build a home security system with the stuff in my junk box. Here's what happened …

  • Using a Raspberry Pi to make a hamster pedometer

    Researchers assert that hamsters run the equivalent of four marathons per night. We tested this with the help of a converted playback head from a video recorder, a hall sensor, and a Raspberry Pi.