Perl Tutorial

Fundamentals

Input and Output

Control Flow

Arrays and Lists

Hash

Scalars

Strings

Object Oriented Programming in Perl

Subroutines

Regular Expressions

File Handling

Context Sensitivity

CGI Programming

Misc

Perl Special Character Classes in Regular Expressions

Regular expressions in Perl offer a rich set of character classes to help match specific groups of characters. In addition to the basic character classes like \d, \w, and \s, Perl provides a variety of special character classes that are extremely useful in many contexts.

Perl Special Character Classes in Regular Expressions Tutorial

1. Introduction

Character classes in regex are used to match specific types of characters. They're enclosed in square brackets [...] or represented using backslashes followed by a character (e.g., \d).

2. Basic Character Classes

  • \d: Matches a digit (0-9).
  • \D: Matches a non-digit.
  • \w: Matches a word character (alphanumeric characters plus underscore).
  • \W: Matches a non-word character.
  • \s: Matches whitespace (spaces, tabs, newlines).
  • \S: Matches non-whitespace.

3. POSIX Character Classes

Perl supports POSIX character classes, which can be useful for matching certain groups of characters:

  • [:alpha:]: Matches any alphabetical character.
  • [:digit:]: Matches any numeric character (equivalent to \d).
  • [:alnum:]: Matches alphanumeric characters.
  • [:lower:]: Matches lowercase characters.
  • [:upper:]: Matches uppercase characters.
  • [:punct:]: Matches punctuation characters.
  • [:space:]: Matches whitespace characters, similar to \s.
  • [:blank:]: Matches space and tab.
  • [:cntrl:]: Matches control characters.
  • [:graph:]: Matches characters that have a visible representation (excluding spaces).
  • [:print:]: Matches printable characters (including spaces).

Usage:

if ($string =~ /[[:alpha:]]/) {
    print "The string contains an alphabetical character.";
}

4. Unicode Properties

Perl also provides Unicode property escapes, which are immensely powerful for matching characters based on their Unicode properties:

  • \p{Property}: Matches a character with a certain Unicode property.
  • \P{Property}: Matches a character without a certain Unicode property.

For example:

  • \p{L}: Matches any kind of letter from any language.
  • \p{Lu}: Matches an uppercase letter.
  • \p{Ll}: Matches a lowercase letter.
  • \p{Nd}: Matches a digit.

Usage:

if ($string =~ /\p{L}/) {
    print "The string contains a letter from some language.";
}

5. ASCII and Unicode Mode

In ASCII mode, the character classes \d, \s, and \w match only ASCII characters. But if you use the /u regex modifier or have use feature 'unicode_strings'; enabled, they match the full Unicode ranges for those classes.

6. Combining Character Classes

Character classes can be combined for more complex matches:

# Match a sequence of three letters followed by two numbers:
if ($string =~ /[[:alpha:]]{3}\d{2}/) {
    print "Pattern matched!";
}

7. Negating Character Classes

Use ^ as the first character inside a character class to negate it:

# Match a character that's not a digit:
if ($string =~ /[^0-9]/) {
    print "Found a non-digit character!";
}

8. Summary

Special character classes in Perl's regular expressions provide a powerful way to match specific types of characters. Whether you're working with ASCII data or handling Unicode, Perl's regex character classes offer both flexibility and precision in text processing.

  1. Using \d, \w, \s in Perl regex:

    • Description: \d matches any digit, \w matches any word character, and \s matches any whitespace character.
    • Code Example:
      my $text = "A 42-year-old cat.";
      
      if ($text =~ /(\d+) (\w+) (\s+)/) {
          print "Number: $1, Word: $2, Whitespace: '$3'\n";
      }
      
  2. Negating special character classes in Perl:

    • Description: Using \D, \W, \S to match anything except digits, word characters, and whitespace characters, respectively.
    • Code Example:
      my $text = "42 apples, 3# bananas!";
      
      if ($text =~ /(\D+) (\W+) (\S+)/) {
          print "Non-digits: '$1', Non-word characters: '$2', Non-whitespace characters: '$3'\n";
      }
      
  3. Perl regex \D, \W, \S usage:

    • Description: Demonstrating the usage of \D, \W, \S in a regex pattern.
    • Code Example:
      my $text = "1234 abc XYZ \t ";
      
      if ($text =~ /(\D+) (\W+) (\S+)/) {
          print "Non-digits: '$1', Non-word characters: '$2', Non-whitespace characters: '$3'\n";
      }
      
  4. Custom character classes in Perl regex:

    • Description: Creating custom character classes using square brackets [ ].
    • Code Example:
      my $text = "apple, banana, cherry!";
      
      if ($text =~ /([aeiou]+)/) {
          print "Vowels: '$1'\n";
      }
      
  5. Unicode character classes in Perl regex:

    • Description: Using Unicode character classes like \p{L} to match any letter.
    • Code Example:
      use utf8;
      
      my $text = "����ڧӧ֧�, ����ˤ���, Hello!";
      
      if ($text =~ /(\p{L}+)/) {
          print "Letters: '$1'\n";
      }
      
  6. Special character classes and metacharacters in Perl:

    • Description: Utilizing special character classes like . (any character) and metacharacters in regex.
    • Code Example:
      my $text = "abc123!@#";
      
      if ($text =~ /(\w+).(\d+)/) {
          print "Word: '$1', Digit: '$2'\n";
      }
      
  7. Case-insensitive matching with character classes in Perl:

    • Description: Using the /i modifier for case-insensitive matching.
    • Code Example:
      my $text = "Hello, hElLo, HELLO!";
      
      if ($text =~ /hello/i) {
          print "Case-insensitive match found.\n";
      }
      
  8. Word boundary and non-word boundary in Perl regex:

    • Description: Using \b for word boundary and \B for non-word boundary.
    • Code Example:
      my $text = "word boundaries";
      
      if ($text =~ /\bword\b/) {
          print "Word 'word' found with word boundary.\n";
      }