Perl Tutorial

Fundamentals

Input and Output

Control Flow

Arrays and Lists

Hash

Scalars

Strings

Object Oriented Programming in Perl

Subroutines

Regular Expressions

File Handling

Context Sensitivity

CGI Programming

Misc

Regex Character Classes in Perl

Regular expressions are a core feature of Perl, and understanding character classes is fundamental to crafting effective patterns. Here's a tutorial on regex character classes in Perl:

1. What is a Character Class?

A character class allows you to specify a set of characters. It matches exactly one of the characters within the set.

Syntax:

[...]

2. Basic Character Classes:

  • [abc]: Matches any one of the characters a, b, or c.
  • [a-z]: Matches any lowercase alphabetic character.
  • [A-Z]: Matches any uppercase alphabetic character.
  • [0-9]: Matches any digit.

3. Combining Ranges:

  • [a-zA-Z]: Matches any alphabetic character (both uppercase and lowercase).
  • [0-9a-fA-F]: Matches any hexadecimal digit.

4. Negating Character Classes:

By placing a caret (^) at the start of the character class, you can negate it. This will match any character not in the set.

  • [^a-z]: Matches any character that's not a lowercase letter.
  • [^0-9]: Matches any character that's not a digit.

5. Predefined Character Classes:

Perl regex offers predefined shortcuts for commonly used character classes:

  • \d: Matches any digit. Equivalent to [0-9].
  • \D: Matches any non-digit. Equivalent to [^0-9].
  • \w: Matches any word character (alphanumeric characters plus underscore). Equivalent to [a-zA-Z0-9_].
  • \W: Matches any non-word character.
  • \s: Matches any whitespace character (spaces, tabs, line breaks).
  • \S: Matches any non-whitespace character.

6. POSIX Character Classes:

Perl also supports POSIX-style character classes. These classes are more descriptive:

  • [:alpha:]: Matches any alphabetic character. Equivalent to [a-zA-Z].
  • [:digit:]: Matches any digit. Equivalent to [0-9].
  • [:alnum:]: Matches any alphanumeric character. Equivalent to [a-zA-Z0-9].
  • [:space:]: Matches any whitespace character.
  • [:punct:]: Matches any punctuation character.
  • [:lower:]: Matches any lowercase alphabetic character.
  • [:upper:]: Matches any uppercase alphabetic character.

To use a POSIX character class, embed it within a bracket expression. For example: [[:digit:]].

7. Examples:

Here are some Perl snippets that use character classes:

#!/usr/bin/perl
use strict;
use warnings;

my $str = "Price: $45";

if ($str =~ /(\d+)/) {
    print "The price is $1.\n";
}

if ($str =~ /[^\d\s]+/) {
    print "Found non-digit, non-whitespace sequence: $&\n";
}

Summary:

Character classes are a vital tool in regex, allowing you to match specific sets of characters. Whether you're using basic bracket notation or predefined classes, understanding these patterns helps make your Perl regexes more effective and readable.

  1. Using character classes in Perl regex patterns:

    • Description: Character classes allow you to match any one of a set of characters at a particular position in the string.
    • Code Example:
      if ($string =~ /[aeiou]/) {
          print "Vowel found!\n";
      }
      
  2. Defining custom character classes in Perl:

    • Description: You can define custom character classes by placing characters inside square brackets.
    • Code Example:
      if ($string =~ /[A-Za-z]/) {
          print "Alphabetic character found!\n";
      }
      
  3. Negating character classes in Perl:

    • Description: You can negate a character class by placing a caret (^) at the beginning.
    • Code Example:
      if ($string =~ /[^0-9]/) {
          print "Non-digit character found!\n";
      }
      
  4. Predefined character classes in Perl regex:

    • Description: Perl provides shorthand character classes like \d (digits), \w (word characters), and \s (whitespace).
    • Code Example:
      if ($string =~ /\d/) {
          print "Digit found!\n";
      }
      
  5. Character class metacharacters in Perl:

    • Description: Some characters in a character class have special meanings, like - for specifying a range.
    • Code Example:
      if ($string =~ /[0-9a-f]/) {
          print "Hexadecimal digit found!\n";
      }
      
  6. Matching digits with \d in Perl regex:

    • Description: \d matches any digit (0-9) in Perl regex patterns.
    • Code Example:
      if ($string =~ /\d/) {
          print "Digit found!\n";
      }
      
  7. Matching word characters with \w in Perl regex:

    • Description: \w matches any word character (alphanumeric + underscore) in Perl regex patterns.
    • Code Example:
      if ($string =~ /\w/) {
          print "Word character found!\n";
      }
      
  8. Matching whitespace characters with \s in Perl regex:

    • Description: \s matches any whitespace character (space, tab, newline) in Perl regex patterns.
    • Code Example:
      if ($string =~ /\s/) {
          print "Whitespace character found!\n";
      }
      
  9. Unicode character classes in Perl regex:

    • Description: Perl supports Unicode character classes for matching characters from different scripts.
    • Code Example:
      if ($string =~ /\p{Greek}/) {
          print "Greek character found!\n";
      }