Perl Tutorial

Fundamentals

Input and Output

Control Flow

Arrays and Lists

Hash

Scalars

Strings

Object Oriented Programming in Perl

Subroutines

Regular Expressions

File Handling

Context Sensitivity

CGI Programming

Misc

Count the frequency of words in text in Perl

Counting the frequency of words in a text is a common task, especially in text processing or linguistic analysis scenarios. Here's a tutorial on how to count word frequencies in Perl:

1. Prerequisites:

To start with, you'll need a text. For demonstration purposes, we'll use a simple string, but in real scenarios, you'd probably read from a file.

2. The Script:

#!/usr/bin/perl
use strict;
use warnings;

# Sample text
my $text = "Hello world. Hello everyone. Goodbye world.";

# Convert the text to lowercase and split into words
my @words = split /\W+/, lc $text;

# Hash to store word frequencies
my %frequency;

# Count word frequencies
for my $word (@words) {
    $frequency{$word}++;
}

# Display the frequencies
foreach my $key (sort { $frequency{$a} <=> $frequency{$b} } keys %frequency) {
    print "$key: $frequency{$key}\n";
}

3. Explanation:

  • split /\W+/: This splits the string into words. The \W+ regex matches one or more non-word characters (equivalent to [^a-zA-Z0-9_]), effectively splitting the string at spaces, punctuation, etc.

  • lc $text: This converts the entire text to lowercase so that the word counting is case insensitive.

  • $frequency{$word}++: For each word, we increment its count in the %frequency hash.

  • sort { $frequency{$a} <=> $frequency{$b} }: This sorts the words based on their frequencies.

4. Reading from a File:

If you wish to process text from a file, replace the sample $text assignment with:

open my $fh, '<', 'path_to_file.txt' or die "Cannot open file: $!";
my $text = join '', <$fh>;
close $fh;

Remember to replace 'path_to_file.txt' with your actual file path.

5. Summary:

This Perl script allows you to count the frequency of words in a given text. It converts the text to lowercase to ensure case-insensitive counting and uses a hash to store the frequencies. The results are then sorted and printed. With slight modifications, this script can be extended to process large files or even perform more complex textual analyses.

  1. Counting words in a text file with Perl:

    • Description: Read a text file and count the occurrences of each word.
    • Code:
      my $file_path = 'sample.txt';
      open my $fh, '<', $file_path or die "Unable to open file: $!";
      my %word_count;
      
      while (my $line = <$fh>) {
          chomp $line;
          my @words = split /\s+/, $line;
          $word_count{$_}++ foreach @words;
      }
      
      close $fh;
      
      # Display word count
      foreach my $word (keys %word_count) {
          print "$word: $word_count{$word}\n";
      }
      
  2. Perl script for word frequency analysis:

    • Description: Analyze word frequency in a given text.
    • Code:
      my $text = "This is a sample text for word frequency analysis in Perl.";
      my %word_count;
      
      my @words = split /\s+/, $text;
      $word_count{$_}++ foreach @words;
      
      # Display word count
      foreach my $word (keys %word_count) {
          print "$word: $word_count{$word}\n";
      }
      
  3. Hashes and arrays for word counting in Perl:

    • Description: Use hashes and arrays to count words in Perl.
    • Code:
      my $text = "Perl programming is fun. Perl scripting is powerful.";
      my %word_count;
      
      my @words = split /\s+/, $text;
      $word_count{$_}++ foreach @words;
      
      # Display word count
      foreach my $word (keys %word_count) {
          print "$word: $word_count{$word}\n";
      }
      
  4. Tokenizing and counting words in Perl:

    • Description: Tokenize and count words using a subroutine.
    • Code:
      sub count_words {
          my ($text) = @_;
          my %word_count;
      
          my @words = split /\s+/, $text;
          $word_count{$_}++ foreach @words;
      
          return %word_count;
      }
      
      my $text = "Perl is a versatile programming language.";
      my %result = count_words($text);
      
      # Display word count
      foreach my $word (keys %result) {
          print "$word: $result{$word}\n";
      }
      
  5. Removing stop words in word frequency analysis with Perl:

    • Description: Exclude common stop words from word frequency analysis.
    • Code:
      my $text = "This is a sample text. It includes common stop words like the and is.";
      my %word_count;
      
      my @stop_words = ('the', 'and', 'is');  # Add more stop words as needed
      
      my @words = grep { lc($_) ne lc($_) } split /\s+/, $text;
      $word_count{$_}++ foreach @words;
      
      # Display word count
      foreach my $word (keys %word_count) {
          print "$word: $word_count{$word}\n";
      }
      
  6. Perl regular expressions for word extraction:

    • Description: Extract words using regular expressions.
    • Code:
      my $text = "Regular expressions provide powerful text processing features in Perl.";
      my @words = $text =~ /\b\w+\b/g;
      
      # Display extracted words
      print join(', ', @words) . "\n";
      
  7. Perl script for analyzing text data:

    • Description: Analyze text data using various techniques.
    • Code:
      my $text = "Text analysis involves word counting, tokenizing, and pattern matching.";
      my @words = split /\s+/, $text;
      
      # Word count
      my $word_count = scalar @words;
      
      # Tokenizing
      my $tokens = join(', ', @words);
      
      # Pattern matching
      my @matches = $text =~ /(\w+ing)/g;
      
      # Display results
      print "Word Count: $word_count\n";
      print "Tokens: $tokens\n";
      print "Matches: " . join(', ', @matches) . "\n";
      
  8. Case-insensitive word frequency count in Perl:

    • Description: Perform case-insensitive word frequency count.
    • Code:
      my $text = "Perl and python are both scripting languages. Perl is case-sensitive.";
      my %word_count;
      
      my @words = split /\s+/, $text;
      $word_count{lc($_)}++ foreach @words;
      
      # Display word count
      foreach my $word (keys %word_count) {
          print "$word: $word_count{$word}\n";
      }
      
  9. Generating word cloud from text in Perl:

    • Description: Generate a simple word cloud from text.

    • Code:

      use WordCloud;
      
      my $text = "Perl is a powerful programming language used for various purposes.";
      my $wordcloud = WordCloud->new();
      $wordcloud->generate($text);
      $wordcloud->to_file("wordcloud.png");
      
    • Note: This example uses the WordCloud module, which may need to be installed separately.