Learning file management commands

Compressing Files

Compression is less essential now than it was in the days of 100MB hard drives, but it continues to be important for creating backups or sending files as email attachments. The Bash shell includes four commands for compression: the original tar, gzip, bzip2, and – more rarely – cpio.

When you exchange files with users of other operating systems, use gzip so they can open the archive. Gzip's basic use is straightforward, with a list of files following the command, but you can use a variety of options to control what happens.

To set the amount of compression, you can use the parameter --best <number>. Or, to set the speed of compression, you can use --fastest <number>. Both are measured on a scale of 1 to 9. Note that you need to use the -N option to preserve the original files; otherwise, they will be deleted when the archive is created.

To work with files in a gzip archive, you can use several utilities:

  • zcat displays files in a gzip archive.
  • zcmp compares files in a gzip archive.
  • zdiff lists differences between files in a gzip archive.
  • zgrep, zegrep, and zfgrep search for text patterns in gzip-archived files.

One especially useful utility is gunzip, which amounts to an alias for gzip because it uses most of the same options. But, if you can't be bothered learning another command, you can simply use the command gzip -d.

By contrast, the bzip2 command produces archives that are 10 to 20 percent smaller than those produced by gunzip. But, although bzip2 and gzip serve similar purposes, bzip's options are considerably different. For one thing, you have to specify sub-directories, because bzip2 lacks an -r option. For another, you use the -z option to compress files and -d to decompress. To keep the original files after the archive is created, use the -k option.

Like gzip, bzip2 has some related utilities for working with its archives:

  • bzipcat displays the contents of a file in an archive, with the same options as the cat command.
  • bziprecover helps recover damaged archived files.
  • bunzip2 decompresses files.

The differences between gzip and bzip2 can be hard to remember, so many users prefer to rely on the tar command. The tar command not only has the advantage of having options to use gzip and gunzip (-z) or bzip2 (-j), but it also offers far more control over exactly how you compress files.

In fact, tar's options run into the dozens  – too many to detail here. For example, you can use --exclude <file> to exclude a file and -p to preserve the permissions of a file. If you want to preserve the permissions of a directory structure, use -p; of course, you would specify the directory in that case. To be safe when decompressing, use -k to prevent any accidental overwriting of files.

The tar command also includes its own built-in utilities in many cases. To add one archive to another, use the format

tar --append <tarfile1> <tarfile2>

to update an archive with newer versions of files with the same name, use the -u option; or to compare the files in an archive with other files, use the format:

tar --compares <tarfile files>

The fourth compression command, cpio, has fallen out of favor in recent years, probably because its format is non-standard. For example, to create an archive with cpio, you have to pipe ls through it and specify the output to a specific file with ls | cpio -o > <outputfile.cpio>.

That said, cpio has even more options than tar, including such powerful alternatives as the ability to archive an entire directory tree and create archives in multiple formats (of which TAR is the only one that is widely used), as well as numerous options to view and edit already archived files. But, unless you are a system administrator or an old Unix hand, chances are you will rarely see cpio used.

Extending File Management with Globbing

One reason shell commands are so powerful is that they can work with multiple files. With many commands, the easiest way to work with multiple files by entering a space-delimited list directly after the command. However, the most concise and efficient way to handle multiple files is through file globbing.

File globbing refers to the use of regular expressions (often abbreviated to regex), pattern matching, metacharacters, or wild cards. The terms are not quite synonymous, although they are mostly used as if they were. But, whatever term you use, it refers to a string of characters that can stand for many different strings.

The most widely used glob in the Bash shell is the asterisk (*), which stands for any number of unknown characters. This glob is especially useful when you want to find files that share the same extension. For instance, the command ls *.png lists all the .png graphics in the current directory.

By contrast, a question mark (?) stands for any single character. If you enter the command ls ca?.png, the list of matches will include the files cat.png and cab.png but not the file card.png, which contains two characters instead of one after the ca.

From these simple beginnings, globs can quickly become more elaborate. To specify specific characters, you can use square brackets, so that test[12].png locates files test1.png and test2.png, but not test3.png (Figure 8). Also, you can specify a search for a regex at the start (^) or the end ($) of a line. Similarly, you can search at the start of a word with \< or the end of a word with \>  – and these are simply a few common possibilities. Using globs is an art form, and experts rightly pride themselves on their ability to construct elaborate and elegant globs.

Figure 8: All you need is a few regular expressions to increase the flexibility of commands. Here, their use greatly simplifies the finding of files.

But what if you want to work with a metacharacter? Then you put a backslash (\) in front of it. For instance, \\ indicates that you are looking for a backslash, not a directory. The backslash is known as an escape character, and it signals that the command should read what follows literally, instead of as a glob.

Globs can be especially useful when you want a selected list from a directory full of files or when you are using one of the grep commands to find content inside a file. However, you must be careful about using globs with commands like rm or mv that change or rearrange the content of your hard drive. Otherwise, a command can have disastrous consequences. To be safe, consider using a newly constructed glob with the innocuous ls command, so you can see which files it might affect.

Buy this article as PDF

Express-Checkout as PDF

Pages: 6

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content