Data processing with the Linux tools
Python
If you prefer Python [7] over Perl, check out the script in Listing 3, which is similar to Listing 2, except it uses a few tricks in the form of modules. Lines 2 and 3 state that the script will use predefined modules fileinput
[8] and re
[9], making it possible to process the input and output easily and use regular expressions.
Listing 3
Python
01 #!/usr/bin/python 02 import fileinput 03 import re 04 total = 0 05 for line in fileinput.input(): 06 if re.match('\d+\t+.*\t+.*\t*\d+', line): 07 columns = re.split('\t+', line) 08 total += int(columns[0]) * int(columns[3]) 09 print("total: %i miles" % (total))
The variable total,
defined in line 4, is initialized with a value of zero. The for
loop (lines 5-8) processes the driving log and iterates line by line over the input stream.
The data stream exists as a list because of the call to the fileinput.input()
function. The content of the list is made available either via STDIN or from the file that was provided as an invocation parameter.
With the help of a regular expression, line 6 then checks to see whether the lines read conform to the desired structure. The only lines considered are those comprising one or more digits followed one or more tab characters, zero or more arbitrary characters, one or more tabs, zero or more arbitrary characters, zero or more tabs, and finally one or more digits; this regex automatically skips over the header line for the driving log.
Again using regex, the script divides the lines into individual columns with tabs. These columns are used to calculate the total distance (line 8) by summing the distance per trip values into total
, resulting in an integer value through explicit conversion of the column values with int()
.
Line 9 produces the final output in the form of total distance. You have the same possibilities for calling the script as for the Perl version referred to previously:
$ python distance.py drivinglog.txt Total: 1740 miles $ ./distance.py drivinglog.txt Total: 1740 miles $ cat drivinglog.txt | python distance.py Total: 1740 miles $ cat drivinglog.txt | ./distance.py Total: 1740 miles
Tcl
The tool command language (Tcl) [10] might seem out of date, but its capabilities for processing text files remain applicable to today's world. The script in Listing 4 also relies on using STDIN for input and regular expressions.
Listing 4
Tcl
01 set totaldistance 0 02 while {1} { 03 set line [gets stdin] 04 if {[eof stdin]} { 05 close stdin 06 break 07 } 08 set fields [regexp -all -inline \[^\t\]+ $line] 09 if {[string is integer -strict [lindex $fields 0]]} { 10 incr totaldistance [expr [lindex $fields 0] * [lindex $fields 3]] 11 } 12 } 13 puts "Total: $totaldistance miles"
After defining the totaldistance
variable and initializing it to zero (line 1), a while
loop (lines 2-12) reads from STDIN (line 3) as long as input data is available. The loop exits when an end of file (eof
) condition occurs (lines 4-7).
In line 8, the script separates each line into individual columns with the help of a regular expression using one or more tabs. The fields
variable is a list, in which each element represents a column in the driving log.
A check occurs in line 9 to see whether the character sequence in field 0 matches an integer number. If yes, then a header is not involved. The expression in line 10 multiplies fields 0 and 3 together (list index beginning with zero, so the first and fourth columns) and adds the results to the total sum. Note that the incr()
statement accepts a second parameter containing a subtotal.
After executing the body of the loop, line 13 outputs the total distance. The Tcl script expects the driving log via STDIN; therefore, you should use the following invocation to execute the script:
$ cat drivinglog.txt | /usr/bin/tclsh distance.tcl Total: 1740 miles
Buy this article as PDF
Pages: 6
(incl. VAT)