Slicing and dicing data with regular expressions

Lead Image © Jeremy Mayes,

Strings and Things

Regular expressions help you filter through the data to find the information you need.

Most computer systems have an assortment of tools for filtering and processing data. A virus scanner, a spam fighter, a web search engine, a spell checker – each is a filter that sifts though data to isolate the information you really need. Your shell provides a filter, too. For example, ls *.jpg lists only JPEG images.

Because so much of Linux depends on interpreting and processing plain text files, an entire shorthand exists for creating filters. The shorthand is called regular expressions, or regex. A regex applied to text can find, dissect, and extract virtually any pattern you seek. Table 1 shows some common regex operators, which you can string together and use in combination to build arbitrarily complex filters.

The origin of regex dates back some 60 years to research in theoretical computer science, a branch of study that includes the design and analysis of algorithms and the semantics of programming languages. The earliest progenitor described models of computation in a shorthand notation called a "regular expression." The shorthand was first co-opted for use in the QED editor found in the original Unix operating system, but it has since expanded into a POSIX standard for pattern matching. Today, the most popular implementation of regex is the Perl-Compatible Regular Expressions library, or PCRE. You will find the PCRE in Perl, Apache, Ruby, PHP, and many other languages and tools.


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $2.95
(incl. VAT)

Buy Raspberry Pi Geek

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content