Combining commands with pipes and redirection

Lead Image © Aliaksandr Herasimau, 123RF.com

Pipe Time

Special tools in the shell help you combine commands to create impromptu applications.

The Linux command line provides hundreds of small utilities to read, write, parse, and analyze data. With just a few extra keystrokes, you can combine those utilities into innumerable impromptu applications. For example, imagine you must extract an actor's lines. That is to say, given the text shown in Listing 1, you must produce That's what they call a sanity clause for Groucho. The grep command can find substrings, strings, and patterns in a text file. You can use grep to find all lines that begin with GROUCHO. Then you can use cut to divide the matching lines into pieces and combine the two commands with a pipe (|):

$ grep -i -E '^Groucho' marx.txt | cut -d ':' -f 2
That's what they call a sanity clause.

Listing 1

marx.txt

GROUCHO: That's what they call a sanity clause.
CHICO: Ah, you fool wit me. There ain't no Sanity Claus!

The grep clause searches the file marx.txt for all occurrences of "Groucho" that appear at the beginning of a line (-E '^Groucho'), ignoring differences in case (-i). The cut clause separates the line into fields delimited by a colon (-d ':') and selects the second field (-f 2). The pipe operator turns the output of the grep clause into the input of the cut clause.

A pipe connects any two commands, and you can construct a long chain of commands with many pipes. For example, if you want to count the number of words Groucho speaks, you can append the clause | wc -w to the previous command.

The pipe is just one form of redirection. Redirection tools can change the source or the destination of a process's data. The shell offers other forms of redirection, too, and learning how to apply these tools is key to mastering the shell.

Data In, Data Out

If you run grep by itself, it reads data from the standard input device (stdin) and emits results to the standard output device (stdout). Errors are sent to a third channel called the standard error device (stderr).

Typically, the data for stdin is provided by you via the keyboard, and by default, both stdout and stderr are sent to the terminal connected to your shell. However, you can redirect any or all of those conduits. For instance, you can redirect stdin to read data from a file instead of the keyboard. You can also redirect stdout and stderr (separately) to write data somewhere other than the terminal window. As shown previously, you can also redirect the stdout of one command to become the stdin of a subsequent command.

The syntax for redirection depends on the shell you use, but almost all shells support the following operations:

  • < input_file redirects stdin to read data from the named file.
  • > output_file redirects stdout, sending the results of a command or a pipe (but not the errors) to a named file. If the file does not exist, it is created; if the file exists, its contents are overwritten with the results.
  • >> output_file is similar to > but appends stdout to the named file. If the file does not exist, it is created; however, if the file exists, its contents are preserved and amended with the results.
  • >& output_file works like >, but it captures stdout and stderr in the specified file, creating the file if necessary, and overwriting the contents if it previously existed.

A few examples are shown in Listing 2.

Listing 2

Redirection Examples

$ # First example
$ grep -i -E '^Groucho' marx.txt | cut -d ':' -f 2  > groucho.txt
$ cat groucho.txt
That's what they call a sanity clause.
$ # Second example
$ cat timecard.txt
I started work on Nov 1 at 8.15 am.
I finished work on Nov 1 at 5 pm.
$ echo 'I started work on Nov 2 at 9 am.' >> timecard.txt
$ cat timecard.txt
I started work on Nov 1 at 8.15 am.
I finished work on Nov 1 at 5 pm.
I started work on Nov 2 at 9 am.
$ # Third example
$ ruby myapp.rb < data >& log

In Listing 2, the first command should look familiar. The addition of > groucho.txt saves the output of the command line to the file groucho.txt. The second command appends the string I started work on Nov 2 at 9 am. to the file timecard.txt. The third command runs the Ruby script myapp.rb. Input is taken from the file named data and the stdout and stderr are captured in log.

Advanced Use of Pipes

Consider the following command-line combination:

$ find /path/to/files \
   -type f | xargs grep -H -I -i  -n string

This command enumerates all plain files in the named path, searches each one for occurrences of the given string, and generates a list of files that contain the string, including the line number and the specific text that matched. The find clause searches the entire hierarchy rooted at /path/to/files, looking for plain files (-type f). Its output is the list of plain files.

The xargs clause is special: xargs launches a command – here, grep plus everything to the end of the line – once for each file listed by find. The options -H and -n preface each match with the file name and line number of each match, respectively. The option -i ignores case. -I skips binary files.

Assuming that the directory /path/to/src contains files a, b, and c, using find in combination with xargs is the equivalent of:

$ find /path/to/src
a
b
c
$ grep -H -I -i -n string a
$ grep -H -I -i -n string b
$ grep -H -I -i -n string c

In fact, searching a collection of files is so common that grep has its own option to recurse a file system hierarchy. Use -d recurse or its synonyms -R or -r. For example, the command

grep -H -I -i -n -R string /path/to/src

works as well as the combination of find and xargs. However, if you need to be selective and pick specific kinds of files, use find.

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content