Free Ride
Linux has a powerful set of programming tools that lets you manipulate data without the need to install additional software.
Linux has a powerful set of programming tools that lets you manipulate data without the need to install additional software.
Computers were originally developed to perform computations, so it makes sense that they come equipped with all the tools you need to solve a variety of simple computation problems. A number of diverse and widely available scripting languages can manipulate data and deliver the information you need.
To demonstrate Linux tools at work, I use three common scripting languages, Perl, Python, and Tcl, to address an everyday problem: calculating the total distance driven based on entries in a driver's logbook. I also solve the same problem using the database management system PostgreSQL and the open source LibreOffice Calc spreadsheet program.
Drivers often have to keep a logbook as a job requirement or for tax purposes. Sometimes, the only number needed in the end is the entire distance driven. The question becomes how to compute this number from the log entries with the least amount of effort. Creating and installing a special program to do this would be a bit over the top, so the solution presented here demonstrates how to solve this problem using the tools you can find on a Raspberry Pi.
The focus of the solution was to create a compact, standards-conforming, and easily understood software module. Instead of optimizing to the last bit for every element in the module, I put everything together so that the end result could be easily understood, even after the passage of time.
The variations on this solution brought together ideas from the participants of Unix/Linux courses I have taught that target certification from the Linux Professional Institute [1]. The techniques explored in this article emphasize concise and well thought-out solutions. Variety was the guiding principle. Each of the solutions delivers the correct result but requires unique explanations and unique foreknowledge for understanding the underlying and somewhat-cryptic program code.
The example and the solutions presented in this article can apply to additional themes, such as computing cash register receipts, calculating a shopping cart total for online shopping, or determining download statistics for web pages to see how frequently content on a website is accessed.
A simple text file serves as the starting point for the computations. The file is divided into five columns, including:
The first line of each file contains a heading, which is followed by rows of data representing the trips taken.
The text file has a very simple name, drivinglog.txt
. The log should have a pleasant appearance when printed, which is achieved by means of a format that uses a variable number of tabs for spacing between columns (Figure 1).
The first approach to thinking about the solution was to look at the Raspbian toolbox. Listing 1 puts together the commands and tools cat
, tr
, tail
, awk
, sed
, and bc
. Each of these tools filters the data stream and modifies it without changing the original file (Table 1).
Listing 1
Raspbian Toolbox
$ cat drivinglog.txt \ > | tr -s '\t' ':' \ > | tail --lines=+2 \ > | awk -F : '{print $1 * $4}' \ > | tr '\n' '+' \ > | sed 's/+$/+0\n/' \ > | bc 1740
Table 1
Raspbian Tools
Command | Tool Summary | Action |
---|---|---|
cat drivinglog.txt |
Copy (concatenate) input to standard output |
Reads the file containing the driving log and outputs the content to STDOUT (standard output data stream), which is normally associated with the monitor; however in this case, it is piped to the next tool. |
tr -s '\t' ':' \ |
Translate/transliterate or delete characters |
Replaces the multiple tabs in the data stream with colons, a separator that otherwise does not appear in the driving log. |
awk -F : '{print $1 * $4}' |
Pattern scanning and processing language [2] |
The -F : switch interprets a colon as a separator between columns in a row of data; the print command takes the first and fourth columns from each line and outputs the product. |
tr '\n' '+' \ |
Translate/transliterate or delete characters |
Replaces all of the line breaks (\n) in the data stream with a plus sign. |
sed 's/+$/+0\n/' \ |
Stream editor to filter and transform text [3] |
Searches for a plus symbol directly in front of a line break and replaces both symbols with a + followed by a zero and a line break. |
bc |
Arbitrary-precision arithmetic language [4] |
Uses the plus operators inserted between the values to sum the lines. |
For purposes of clarity, I entered each individual command of the driving log toolchain on its own line in the console. The backslash (\
) at the end of each line serves as a continuation symbol, indicating that the shell (which prompts with the >
symbol) should handle the next line as a part of the previous line. The output of one tool is streamed as input to the next tool via a pipe, which is represented by the vertical line (|
) on your keyboard; the data streams through a FIFO (first in, first out) buffer between the two processes. You could also leave the backslashes out and enter all of the commands in a single line.
The intermediate results for each trip after the awk
command is located on a separate line identified with a line break (\n
). Because I want to add the results, the bc
tool needs an operator between values. When tr
substitutes plus symbols for line breaks, you end up with a single line with a mathematical expression that bc
can then process.
Unfortunately, the final line break has also been eliminated. If left unresolved, the missing line break would result in a missing operator that bc
needs to perform the computation. The regular expression [5] (regex) to the stream editor Sed resolves this problem by ending the line with a "+0" and a line break.
This action does not change the total distance, and it creates a correct expression and a line break at the end, which bc
needs to add all of the lines together. It then outputs the resulting total to STDOUT.
Pages: 6
Price $15.99
(incl. VAT)