Data processing with the Linux tools
Database
The calculation also works with a database management system (DBMS), which is designed to support these kinds of tasks. Although you could use any database with an SQL query interface, I use PostgreSQL [11] in this example.
So-called aggregation functions [12] – sum
, min
, max
, and avg
– are used by PostgreSQL to evaluate values. The invocation for this example would be:
SELECT SUM(number*distance) FROM drivinglog;
This time, drivinglog
is not a file, but rather a database table. The terms number
and distance
refer to columns in the drivinglog table. Listing 5 shows the table entries, as well as the invocation for determining the total distance. PostgreSQL generates a results column called sum
that contains the total distance as a single line.
Listing 5
PostgreSQL
database=> select * from drivinglog; ** number| from_city | to_city | distance --------+------------------+---------+--------- 1 | Berlin | Potsdam | 30 2 | Berlin | Hamburg | 280 2 | Frankfurt (Main) | Paris | 575 (3 lines) ** database=> select sum(number*distance) from drivinglog; sum ------ 1740 (1 line)
Spreadsheet
If you would rather click around than program, you can use a spreadsheet, such as Open/LibreOffice Calc [13], Gnumeric [14] from Gnome, or the standalone programs Pyspread [15] or Sc [16]. Calc even offers a suitable template free of charge [17].
Regardless of the spreadsheet, you'll need to enter the short formulas necessary for the calculations. These formulas allow you to calculate the number
(of trips) multiplied by the distance
, as well as the total of all intermediate values. In the example shown in Figure 2, the corresponding expression is sum(A2*D2,A3*D3,A4*D4)
.
The sum()
function interprets the values inside brackets as a list of individual values. The entry A2
refers to the second row in column A of the current sheet.
Buy this article as PDF
Pages: 6
(incl. VAT)