Thursday, January 26, 2017

PL/Perl and Large PostgreSQL Databases

One of the topics discussed in the large database talk is the way we used PL/Perl to solve some data variety problems in terms of extracting data from structured text documents.

It is certainly possible to use other languages to do the same, but PL/Perl has an edge in a number of important ways.  PL/Perl is light-weight, flexible and fills this particular need better than any other language I have worked with.

While one of the considerations has often been knowledge of Perl in the team, PL/Perl has a number of specific reasons to recommend it:

  1. It is light-weight compared to PL/Java and many other languages
  2. It excels at processing text in general ways.
  3. It has extremely mature regular expression support
These features combine to create a procedural language for PostgreSQL which is particularly good at extracting data from structured text documents in the scientific space.  Structured text files are very common and being able to extract, for example, a publication date or other information from the file is very helpful.

Moreover when you mark your functions as immutable, you can index the output, and this is helpful when you want ordered records starting at a certain point.

So for example, suppose we want to be able to query on plasmid lines in UNIPROT documents but we have not set this up before we loaded the table.  We could easily create a PL/Perl function like:

CREATE OR REPLACE FUNCTION plasmid_lines(uniprot text) 
RETURNS text[]
LANGUAGE PLPERL IMMUTABLE AS
$$
use strict;
use warnings;
my ($uniprot) = @_;
my @lines = grep { /^OG\s+Plasmid/ } split /\n/ $uniprot;
return [ map {  my $l = $_; $l =~ s/^OG\s+Plasmid\s*//; $l } @lines ];
$$;


You could  then create a GIN index on the array elements:

CREATE INDEX uniprot_doc_plasmids ON uniprot_docs USING gin (plasmid_lines(doc));

Neat!

6 comments:

  1. ERP Software SolutionsERP Software Solutions gives the correct view of business information.

    ReplyDelete
  2. Thanks for sharing this wonderful information. I hope you will share more helpful information regarding the content.
    web portal development company in chennai
    sem services in chennai
    professional web design company in chennai

    ReplyDelete
  3. Nice blog with Amazing information .. love to read about this.
    Awaiting for your new post
    We at Fullassignment.com bring to you the most significant Case Study help writing service at the best cost. With long stretches of understanding we are prepared to give assignment help over the globe.You will be guided here with a portion of the information of Marketing assignment which could assist you in deciding writing a Marketing assignment. Nonetheless, we unequivocally prescribe you to benefit Case Study Assignment Help from our specialist to find out about marketing and its scope.We also provide Chemistry Assignment Help from our experts.

    https://fullassignment.com/

    ReplyDelete
  4. Your article is incredibly elegantly composed. This is extraordinary instructive substance from my perspective. You additionally make numerous admirable statements with convincing, novel substance.


    SEO services in kolkata
    Best SEO services in kolkata
    SEO company in kolkata
    Best SEO company in kolkata
    Top SEO company in kolkata
    Top SEO services in kolkata
    SEO services in India
    SEO copmany in India

    ReplyDelete