It is certainly possible to use other languages to do the same, but PL/Perl has an edge in a number of important ways. PL/Perl is light-weight, flexible and fills this particular need better than any other language I have worked with.
While one of the considerations has often been knowledge of Perl in the team, PL/Perl has a number of specific reasons to recommend it:
- It is light-weight compared to PL/Java and many other languages
- It excels at processing text in general ways.
- It has extremely mature regular expression support
Moreover when you mark your functions as immutable, you can index the output, and this is helpful when you want ordered records starting at a certain point.
So for example, suppose we want to be able to query on plasmid lines in UNIPROT documents but we have not set this up before we loaded the table. We could easily create a PL/Perl function like:
CREATE OR REPLACE FUNCTION plasmid_lines(uniprot text)
RETURNS text[]
LANGUAGE PLPERL IMMUTABLE AS
$$
use strict;
use warnings;
my ($uniprot) = @_;
my @lines = grep { /^OG\s+Plasmid/ } split /\n/ $uniprot;
return [ map { my $l = $_; $l =~ s/^OG\s+Plasmid\s*//; $l } @lines ];
$$;
You could then create a GIN index on the array elements:
CREATE INDEX uniprot_doc_plasmids ON uniprot_docs USING gin (plasmid_lines(doc));
Neat!