Data processing with the Linux tools

Lead Image © studiostoks, 123RF.com

Free Ride

Linux has a powerful set of programming tools that lets you manipulate data without the need to install additional software.

Computers were originally developed to perform computations, so it makes sense that they come equipped with all the tools you need to solve a variety of simple computation problems. A number of diverse and widely available scripting languages can manipulate data and deliver the information you need.

To demonstrate Linux tools at work, I use three common scripting languages, Perl, Python, and Tcl, to address an everyday problem: calculating the total distance driven based on entries in a driver's logbook. I also solve the same problem using the database management system PostgreSQL and the open source LibreOffice Calc spreadsheet program.

The Problem

Drivers often have to keep a logbook as a job requirement or for tax purposes. Sometimes, the only number needed in the end is the entire distance driven. The question becomes how to compute this number from the log entries with the least amount of effort. Creating and installing a special program to do this would be a bit over the top, so the solution presented here demonstrates how to solve this problem using the tools you can find on a Raspberry Pi.

The focus of the solution was to create a compact, standards-conforming, and easily understood software module. Instead of optimizing to the last bit for every element in the module, I put everything together so that the end result could be easily understood, even after the passage of time.

The variations on this solution brought together ideas from the participants of Unix/Linux courses I have taught that target certification from the Linux Professional Institute [1]. The techniques explored in this article emphasize concise and well thought-out solutions. Variety was the guiding principle. Each of the solutions delivers the correct result but requires unique explanations and unique foreknowledge for understanding the underlying and somewhat-cryptic program code.

The example and the solutions presented in this article can apply to additional themes, such as computing cash register receipts, calculating a shopping cart total for online shopping, or determining download statistics for web pages to see how frequently content on a website is accessed.

A simple text file serves as the starting point for the computations. The file is divided into five columns, including:

  1. The number of times a particular distance is traveled, number.
  2. The starting point, from.
  3. The destination, to.
  4. The number of miles traveled, distance.
  5. The reason for the trip, reason.

The first line of each file contains a heading, which is followed by rows of data representing the trips taken.

The text file has a very simple name, drivinglog.txt. The log should have a pleasant appearance when printed, which is achieved by means of a format that uses a variable number of tabs for spacing between columns (Figure 1).

Figure 1: A variable number of tabs creates an orderly appearance for the driving log when it is displayed and printed.

A Contexture of Tools

The first approach to thinking about the solution was to look at the Raspbian toolbox. Listing 1 puts together the commands and tools cat, tr, tail, awk, sed, and bc. Each of these tools filters the data stream and modifies it without changing the original file (Table 1).

Listing 1

Raspbian Toolbox

$ cat drivinglog.txt \
> | tr -s '\t' ':' \
> | tail --lines=+2 \
> | awk -F : '{print $1 * $4}' \
> | tr '\n' '+' \
> | sed 's/+$/+0\n/' \
> | bc
1740

Table 1

Raspbian Tools

Command

Tool Summary

Action

cat drivinglog.txt

Copy (concatenate) input to standard output

Reads the file containing the driving log and outputs the content to STDOUT (standard output data stream), which is normally associated with the monitor; however in this case, it is piped to the next tool.

tr -s '\t' ':' \

Translate/transliterate or delete characters

Replaces the multiple tabs in the data stream with colons, a separator that otherwise does not appear in the driving log.

awk -F : '{print $1 * $4}'

Pattern scanning and processing language [2]

The -F : switch interprets a colon as a separator between columns in a row of data; the print command takes the first and fourth columns from each line and outputs the product.

tr '\n' '+' \

Translate/transliterate or delete characters

Replaces all of the line breaks (\n) in the data stream with a plus sign.

sed 's/+$/+0\n/' \

Stream editor to filter and transform text [3]

Searches for a plus symbol directly in front of a line break and replaces both symbols with a + followed by a zero and a line break.

bc

Arbitrary-precision arithmetic language [4]

Uses the plus operators inserted between the values to sum the lines.

For purposes of clarity, I entered each individual command of the driving log toolchain on its own line in the console. The backslash (\) at the end of each line serves as a continuation symbol, indicating that the shell (which prompts with the > symbol) should handle the next line as a part of the previous line. The output of one tool is streamed as input to the next tool via a pipe, which is represented by the vertical line (|) on your keyboard; the data streams through a FIFO (first in, first out) buffer between the two processes. You could also leave the backslashes out and enter all of the commands in a single line.

The intermediate results for each trip after the awk command is located on a separate line identified with a line break (\n). Because I want to add the results, the bc tool needs an operator between values. When tr substitutes plus symbols for line breaks, you end up with a single line with a mathematical expression that bc can then process.

Unfortunately, the final line break has also been eliminated. If left unresolved, the missing line break would result in a missing operator that bc needs to perform the computation. The regular expression [5] (regex) to the stream editor Sed resolves this problem by ending the line with a "+0" and a line break.

This action does not change the total distance, and it creates a correct expression and a line break at the end, which bc needs to add all of the lines together. It then outputs the resulting total to STDOUT.

Buy this article as PDF

Express-Checkout as PDF

Pages: 6

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Connecting a weather station to your Arduino

    After losing one weather station to tropical winds, the author reboots and designs a PCB that connects to an Arduino and monitors weather instruments.

  • Tracking airplanes in real time with ADS-B

    Airplanes continuously broadcast signals that identify the aircraft and its current flight path. With a moderately priced receiver and a Raspberry Pi, users can receive ADS-B transponder data in real time.

  • Graphical displays with Python and Pygame

    As its name implies, Pygame is a set of Python modules designed to write games. However, many Pygame modules are useful for any number of projects. We introduce you to a few Pygame modules that you can use to create custom graphical displays for your project.

  • A home intrusion detection setup (sort of)

    At least part of the popularity of the Raspberry Pi can be attributed to its high maker value; that is, a skilled maker with a Pi can build marvelous and beautiful things. Me? Not so much, but I was willing to try to build a home security system with the stuff in my junk box. Here's what happened …

  • Using a Raspberry Pi to make a hamster pedometer

    Researchers assert that hamsters run the equivalent of four marathons per night. We tested this with the help of a converted playback head from a video recorder, a hall sensor, and a Raspberry Pi.