Perl Tutorial
Fundamentals
Input and Output
Control Flow
Arrays and Lists
Hash
Scalars
Strings
Object Oriented Programming in Perl
Subroutines
Regular Expressions
File Handling
Context Sensitivity
CGI Programming
Misc
Regular expressions in Perl offer a rich set of character classes to help match specific groups of characters. In addition to the basic character classes like \d
, \w
, and \s
, Perl provides a variety of special character classes that are extremely useful in many contexts.
Character classes in regex are used to match specific types of characters. They're enclosed in square brackets [...]
or represented using backslashes followed by a character (e.g., \d
).
\d
: Matches a digit (0-9).\D
: Matches a non-digit.\w
: Matches a word character (alphanumeric characters plus underscore).\W
: Matches a non-word character.\s
: Matches whitespace (spaces, tabs, newlines).\S
: Matches non-whitespace.Perl supports POSIX character classes, which can be useful for matching certain groups of characters:
[:alpha:]
: Matches any alphabetical character.[:digit:]
: Matches any numeric character (equivalent to \d
).[:alnum:]
: Matches alphanumeric characters.[:lower:]
: Matches lowercase characters.[:upper:]
: Matches uppercase characters.[:punct:]
: Matches punctuation characters.[:space:]
: Matches whitespace characters, similar to \s
.[:blank:]
: Matches space and tab.[:cntrl:]
: Matches control characters.[:graph:]
: Matches characters that have a visible representation (excluding spaces).[:print:]
: Matches printable characters (including spaces).Usage:
if ($string =~ /[[:alpha:]]/) { print "The string contains an alphabetical character."; }
Perl also provides Unicode property escapes, which are immensely powerful for matching characters based on their Unicode properties:
\p{Property}
: Matches a character with a certain Unicode property.\P{Property}
: Matches a character without a certain Unicode property.For example:
\p{L}
: Matches any kind of letter from any language.\p{Lu}
: Matches an uppercase letter.\p{Ll}
: Matches a lowercase letter.\p{Nd}
: Matches a digit.Usage:
if ($string =~ /\p{L}/) { print "The string contains a letter from some language."; }
In ASCII mode, the character classes \d
, \s
, and \w
match only ASCII characters. But if you use the /u
regex modifier or have use feature 'unicode_strings';
enabled, they match the full Unicode ranges for those classes.
Character classes can be combined for more complex matches:
# Match a sequence of three letters followed by two numbers: if ($string =~ /[[:alpha:]]{3}\d{2}/) { print "Pattern matched!"; }
Use ^
as the first character inside a character class to negate it:
# Match a character that's not a digit: if ($string =~ /[^0-9]/) { print "Found a non-digit character!"; }
Special character classes in Perl's regular expressions provide a powerful way to match specific types of characters. Whether you're working with ASCII data or handling Unicode, Perl's regex character classes offer both flexibility and precision in text processing.
Using \d
, \w
, \s
in Perl regex:
\d
matches any digit, \w
matches any word character, and \s
matches any whitespace character.my $text = "A 42-year-old cat."; if ($text =~ /(\d+) (\w+) (\s+)/) { print "Number: $1, Word: $2, Whitespace: '$3'\n"; }
Negating special character classes in Perl:
\D
, \W
, \S
to match anything except digits, word characters, and whitespace characters, respectively.my $text = "42 apples, 3# bananas!"; if ($text =~ /(\D+) (\W+) (\S+)/) { print "Non-digits: '$1', Non-word characters: '$2', Non-whitespace characters: '$3'\n"; }
Perl regex \D
, \W
, \S
usage:
\D
, \W
, \S
in a regex pattern.my $text = "1234 abc XYZ \t "; if ($text =~ /(\D+) (\W+) (\S+)/) { print "Non-digits: '$1', Non-word characters: '$2', Non-whitespace characters: '$3'\n"; }
Custom character classes in Perl regex:
[ ]
.my $text = "apple, banana, cherry!"; if ($text =~ /([aeiou]+)/) { print "Vowels: '$1'\n"; }
Unicode character classes in Perl regex:
\p{L}
to match any letter.use utf8; my $text = "����ڧӧ֧�, ����ˤ���, Hello!"; if ($text =~ /(\p{L}+)/) { print "Letters: '$1'\n"; }
Special character classes and metacharacters in Perl:
.
(any character) and metacharacters in regex.my $text = "abc123!@#"; if ($text =~ /(\w+).(\d+)/) { print "Word: '$1', Digit: '$2'\n"; }
Case-insensitive matching with character classes in Perl:
/i
modifier for case-insensitive matching.my $text = "Hello, hElLo, HELLO!"; if ($text =~ /hello/i) { print "Case-insensitive match found.\n"; }
Word boundary and non-word boundary in Perl regex:
\b
for word boundary and \B
for non-word boundary.my $text = "word boundaries"; if ($text =~ /\bword\b/) { print "Word 'word' found with word boundary.\n"; }