Perl Tutorial
Fundamentals
Input and Output
Control Flow
Arrays and Lists
Hash
Scalars
Strings
Object Oriented Programming in Perl
Subroutines
Regular Expressions
File Handling
Context Sensitivity
CGI Programming
Misc
Counting the frequency of words in a text is a common task, especially in text processing or linguistic analysis scenarios. Here's a tutorial on how to count word frequencies in Perl:
To start with, you'll need a text. For demonstration purposes, we'll use a simple string, but in real scenarios, you'd probably read from a file.
#!/usr/bin/perl use strict; use warnings; # Sample text my $text = "Hello world. Hello everyone. Goodbye world."; # Convert the text to lowercase and split into words my @words = split /\W+/, lc $text; # Hash to store word frequencies my %frequency; # Count word frequencies for my $word (@words) { $frequency{$word}++; } # Display the frequencies foreach my $key (sort { $frequency{$a} <=> $frequency{$b} } keys %frequency) { print "$key: $frequency{$key}\n"; }
split /\W+/
: This splits the string into words. The \W+
regex matches one or more non-word characters (equivalent to [^a-zA-Z0-9_]
), effectively splitting the string at spaces, punctuation, etc.
lc $text
: This converts the entire text to lowercase so that the word counting is case insensitive.
$frequency{$word}++
: For each word, we increment its count in the %frequency
hash.
sort { $frequency{$a} <=> $frequency{$b} }
: This sorts the words based on their frequencies.
If you wish to process text from a file, replace the sample $text
assignment with:
open my $fh, '<', 'path_to_file.txt' or die "Cannot open file: $!"; my $text = join '', <$fh>; close $fh;
Remember to replace 'path_to_file.txt'
with your actual file path.
This Perl script allows you to count the frequency of words in a given text. It converts the text to lowercase to ensure case-insensitive counting and uses a hash to store the frequencies. The results are then sorted and printed. With slight modifications, this script can be extended to process large files or even perform more complex textual analyses.
Counting words in a text file with Perl:
my $file_path = 'sample.txt'; open my $fh, '<', $file_path or die "Unable to open file: $!"; my %word_count; while (my $line = <$fh>) { chomp $line; my @words = split /\s+/, $line; $word_count{$_}++ foreach @words; } close $fh; # Display word count foreach my $word (keys %word_count) { print "$word: $word_count{$word}\n"; }
Perl script for word frequency analysis:
my $text = "This is a sample text for word frequency analysis in Perl."; my %word_count; my @words = split /\s+/, $text; $word_count{$_}++ foreach @words; # Display word count foreach my $word (keys %word_count) { print "$word: $word_count{$word}\n"; }
Hashes and arrays for word counting in Perl:
my $text = "Perl programming is fun. Perl scripting is powerful."; my %word_count; my @words = split /\s+/, $text; $word_count{$_}++ foreach @words; # Display word count foreach my $word (keys %word_count) { print "$word: $word_count{$word}\n"; }
Tokenizing and counting words in Perl:
sub count_words { my ($text) = @_; my %word_count; my @words = split /\s+/, $text; $word_count{$_}++ foreach @words; return %word_count; } my $text = "Perl is a versatile programming language."; my %result = count_words($text); # Display word count foreach my $word (keys %result) { print "$word: $result{$word}\n"; }
Removing stop words in word frequency analysis with Perl:
my $text = "This is a sample text. It includes common stop words like the and is."; my %word_count; my @stop_words = ('the', 'and', 'is'); # Add more stop words as needed my @words = grep { lc($_) ne lc($_) } split /\s+/, $text; $word_count{$_}++ foreach @words; # Display word count foreach my $word (keys %word_count) { print "$word: $word_count{$word}\n"; }
Perl regular expressions for word extraction:
my $text = "Regular expressions provide powerful text processing features in Perl."; my @words = $text =~ /\b\w+\b/g; # Display extracted words print join(', ', @words) . "\n";
Perl script for analyzing text data:
my $text = "Text analysis involves word counting, tokenizing, and pattern matching."; my @words = split /\s+/, $text; # Word count my $word_count = scalar @words; # Tokenizing my $tokens = join(', ', @words); # Pattern matching my @matches = $text =~ /(\w+ing)/g; # Display results print "Word Count: $word_count\n"; print "Tokens: $tokens\n"; print "Matches: " . join(', ', @matches) . "\n";
Case-insensitive word frequency count in Perl:
my $text = "Perl and python are both scripting languages. Perl is case-sensitive."; my %word_count; my @words = split /\s+/, $text; $word_count{lc($_)}++ foreach @words; # Display word count foreach my $word (keys %word_count) { print "$word: $word_count{$word}\n"; }
Generating word cloud from text in Perl:
Description: Generate a simple word cloud from text.
Code:
use WordCloud; my $text = "Perl is a powerful programming language used for various purposes."; my $wordcloud = WordCloud->new(); $wordcloud->generate($text); $wordcloud->to_file("wordcloud.png");
Note: This example uses the WordCloud
module, which may need to be installed separately.