Thursday, February 27, 2014

In Praise of Perl 5

I have decided to do a two-part series here praising Perl in part 1 and PostgreSQL in part 2.  I expect that PostgreSQL folks will get a lot out of the Perl post and vice versa.  The real power of both programming environments is in the relationship between domain-specific languages and general purpose languages.

These posts are aimed at software engineers more than developers and they make the case for building frameworks on these platforms.  The common thread is flexibility and productivity.

This first post is about Perl 5.  Perl 6 is a different language, more a little sister to Perl 5 than a successor.  The basic point is that Perl 5 gives you a way to build domain specific languages (DSL's) that can be seemlessly worked into a general purpose programming environment.  This is almost the exact inverse of PostgreSQL, which offers, as a development environment, a DSL with an ability to work in almost any general purpose development tools into it.  This combination is extremely powerful as I will show.

All code in this post is from the LedgerSMB codebase in different eras (before the fork, during the early rewrite, and now planned code for 1.5).  All code in this post may be used under the GNU General Public License version 2 or at your option any later version.

You can see in the code samples below our evolution in how we use Perl.

This is (bad) Perl


Perl is a language many people love to hate.  Here's an example of bad Perl from early in the LedgerSMB codebase.  It is offered as an example of sloppy coding generally and why maintaining Perl code can be difficult at times.  Note the module makes no use of strict or warnings, and almost all variables are globally package-scoped.

     $column_header{description} =
        "<th><a class=listheading href=$href&sort=linedescription>"
      . $locale->text('Description')
      . "</a></th>";

    $form->{title} =
      ( $form->{title} ) ? $form->{title} : $locale->text('AR Transactions');

    $form->header;

    print qq|
<body>

<table width=100%>
  <tr>
    <th class=listtop>$form->{title}</th>
  </tr>
  <tr height="5"></tr>
  <tr>
    <td>$option</td>


This is not a good piece of maintainable code.  It is very hard to modify safely.  Due to global scoping, unit tests are not possible.  There are many other problems as well.  One of our major goals in LedgerSMB is to rewrite all this code as quickly as we can without making the application unusably unstable.

So nobody can argue that it is possible to create unmaintainable messes in Perl.  But it is possible to do this sort of thing in any language.  One can't judge a language solely because it is easy to write bad code in it.

This is (better) Perl


So what does better Perl look like?   Let's try this newer Perl code, which was added late in the LedgerSMB 1.3 development cycle, and handles asset depreciation:

sub depreciate_all {
    my ($request) = @_;
    my $report = LedgerSMB::DBObject::Asset_Report->new(base => $request);
    $report->get_metadata;
    for my $ac(@{$report->{asset_classes}}){
        my $dep = LedgerSMB::DBObject::Asset_Report->new(base => $request);
        $dep->{asset_class} = $ac->{id};
        $dep->generate;
        for my $asset (@{$dep->{assets}}){
            push @{$dep->{asset_ids}}, $asset->{id};
        }
        $dep->save;
    }
    $request->{message} = $request->{_locale}->text('Depreciation Successful');
    my $template = LedgerSMB::Template->new(
        user =>$request->{_user},
        locale => $request->{_locale},
        path => 'UI',
        template => 'info',
        format => 'HTML'
    );
    $template->render($request);
}


This function depreciates all asset classes to a point at a specific date.  There's a fair bit of logic here but it does many times more work than the previous example, is easier to maintain, and is easier to understand.

This is also Perl!


The above two examples are  pretty straight-forward Perl code examples, but neither one really shows what Perl is capable of doing in terms of writing good-quality, maintainable code.

The fact is that Perl itself is a highly malleable language and this malleability allows you to define domain-specific languages for parts of your program and use them there.

Here's a small class for handling currency records.  POD and comments have been removed.

package LedgerSMB::Currency;
use Moose;
with 'LedgerSMB::PGOSimple::Role', 'LedgerSMB::MooseTypes';

use PGObject::Util::DBMethod;
   
sub _set_prefix { 'currency__' }


has id                => (is => 'rw', isa => 'Int', required => '0');
has symbol            => (is => 'ro', isa => 'Str', required => '1');
has allowed_variance  => (is => 'rw', isa => 'LedgerSMB::Moose::Number',
                          coerce => 1, required => 1);
has display_precision => (is => 'rw', isa => 'Int', required => '0');
has is_default => (is => 'ro', isa => 'Bool', required => '0');

dbmethod list      => (funcname => 'list', returns_objects => 1 );
dbmethod save      => (funcname => 'save', merge_back => 1);

dbmethod get       => (funcname => 'get', returns_objects => 1,
                          arg_list => ['symbol']);

dbmethod get_by_id =>  (funcname => 'get_by_id', returns_objects => 1,
                        arg_list => ['id']);

__PACKAGE__->meta->make_immutable;

Now the code above sets up a whole class including properties, accessors, and methods delegated to database stored prcedures.  The class is effectively entirely declarative.  The same amount of work in a similarly simple module from the 1.3 iteration (TaxForm.pm) requires around 50 lines of code, so more than double, and that's without accessor support.  The 1.4-framework module for handling contact information (phone numbers and email addresses) is around 65 lines of code, with not much more complexity (so around triple).  The simpler Bank.pm (for tracking bank account information) is around 36 lines so nearly double.

What differentiates the examples though is not only line count but readability, testability, and maintainability.  The LedgerSMB::Currency module is more concise, more readable, and has much better testing and maintenance characteristics than the longer modules from the previous frameworks.  Even without comments or POD, if you read the Moose and PGObject::Util::DBMethod documentation, you know immediately what the module does.  And in such a module, comments may not be appropriate, but POD would likely not only be appropriate but take up significantly more space than the code.

How does that work?


Perl is a very flexible and mutable language.  While you can't add keywords, you can add functions that behave more or less like keywords.  Functions can be exported from one module to another and, used judiciously, this can be used to create domain-specific languages which in fact run on generated Perl code.

The example here uses two modules which provide DSL's for specific purposes.  The first is Moose, which has a long history as an extremely important contributor to current Perl object-oriented programming practices.  This module provides the functions "with" and "has" used above.

Moose, in this case works with a PGObject::Simple::Role module which provides a framework for interacting with PostgreSQL db's.  This is extended by LedgerSMB::PGOSimple::Role which provides handling of database connections and the like.

The second is PGObject::Util:DBmethod, which provides the dbmethod function.  It's worth noting that both has and dbmethod are code generators.  When they run, they create functions which they attach to the package.  Used in this way has creates the accessors, while dbmethod creates the delegated methods.

Why is this Powerful and Productive?


The use of robust code generation here at run-time allows you to effectively build modules and classes from specifications of modules and classes rather than implementing that specification by hand.  Virtually all object-oriented frameworks in Perl effectively offer some form of this code generation.

A specification to code language provides a general tradeoff between clarity, expressiveness (in its domain) and robustness on one hand, with inflexibility on the other.  This is the fundamental tradeoff of domain-specific languages generally.  When you merge a domain-specific language into a general-purpose one, however, you gain the freedom to compensate for the lack of flexibility by falling back on more general tools when you need to.  This flexibility is where the production gains are found.

Compare a framework built as a mini-DSL specification language to one built as an object model.  In an object model framework one effectively has to juggle object-oriented design (SOLID principles, etc) with the desire for greater flexibility.  Here, however the DSL's are orthogonal to the object model.  They allow you to define the object model orthogonally to the framework, while re-using the DSL framework however you want.  Of course these are not mutually exclusive, and it is quite possible to have both in a large and powerful application framework.

Other Similarly Powerful Languages


 Perl is not the only language of this kind.  The first example that comes to mind, naturally, is Lisp.  However other Lispy languages are also worth mentioning.  Most prominent among these are Rebol and Red, whose open source implementations are still very immature.  These languages are extremely mutable and the syntax can be easily extended or even rewritten.

Metaprogramming helps to some extent with some of these issues and this is a common way of addressing this in Ruby and Python, but this makes it much harder to build a framework that is truly orthogonal to the object model.

A major aspect of the power of Perl 5 here are the very things which often cause beginning and intermediate programmers headache.  Perl allows, to  a remarkable extent, manipulation of its own internals (perhaps only Rebol, Red, and Lisp take this further).  This allows one to rewrite the language to a remarkable extent, but it also allows for the development of contexts which allow for these sorts of extensions.

The key feature I am looking at here is the mutability of the language.  And there are few languages which are themselves relatively mutable.  Perl isn't just a programming language, but a toolkit for building programming languages inside it.

Monday, February 24, 2014

Notes on Software Testing

It seems software testing is one of those really hard things to get right.  I find very often I run into projects where the testing is inadequate or where, in an overzealous effort to ensure no bugs, test cases are too invasive and test things that shouldn't be tested.  This article attempts to summarize what I have learned from experience in the hope that it is of use to others.

The Two Types of Tests


Software testing has a couple of important functions and these are different enough to form the basis of a taxonomy system of test cases.  These functions include:
  1. Ensuring that a change to a library doesn't break something else
  2. Ensuring general usability by the intended audience
  3. Ensuring sane handling of errors
  4. Ensuring safe and secure operation under all circumstances
 These functions fall themselves into roughly two groups.  The first ensures that the software functions as designed, and the second ensures that where undefined behavior exists, it occurs in a sane and safe way.

The first type of tests then are those which ensure that the behavior conforms to the designed outlines of the contract with downstream developers or users.  This is what we may call "design-implementation testing."

The second type of tests are those which ensure that behavior outside the designed parameters is either appropriately documented or appropriately handled, and can be deployed and used in a safe and secure manner.  This, generally, reduces to error testing.

These two types of tests are different enough they need to be written by different groups.  The design-implementation tests are really best written by the engineers designing the software, and the error tests need to be handled by someone somewhat removed from that process.

Why Software Engineers Should Write Test Cases


Design-implementation tests are a formalization of the interface specification.  As such a formalization the people best prepared to write good software contract tests are those specifying the software contracts, namely the software engineers.

There are a couple ways this can be done.  Engineers can write quick pseudocode intended to document interfaces and test cases to define the contracts, or can develop a quick prototype with test cases before handing off to developers, or the engineers and the developers can be closely integrated.  Either way the engineers are in the best position, knowledge-wise, to write test cases about whether the interface contracts are violated or not.

This works best with an initial short iteration cycle (regarding prototypes).  However the full development could be on a much larger cycle so it is not necessarily limited to agile development environments.

Having the engineers write these sorts of test cases ensures that a few very basic principles are not violated:

  1. The tests do not test the internals of dependencies beyond necessity
  2. The tests focus on interface instead of implementation
These rules help avoid test cases broken needlessly when dependencies fix bugs.

Why You Still Need QA Folks Who Write Tests After the Fact


Interface and design-implementation tests are not enough.  They cover very basic things, and ensure that correct operation will continue.  However they don't generally cover error handling very well, nor do they cover security-critical questions very well.

For good error handling tests, you really need an outside set of eyes, not too deeply tied to current design or coding.  It is easier for an outsider to spot that "user is an idiot" that was left in as a placeholder in an error message than it is for the developer or the engineer.  Some of these can be reduced by cross-team review of changes as they come in.

A second problem is that to test security-sensitive failure modes, you really need someone who can think about how to break an interface, not just what it was designed to do.  The more invested one is, brain-cycle-wise, in implementing the software, the harder it often is to see this.

Conclusion


Software testing is something which is best woven into the development process relatively deeply and should be both a before and after main development.  Writing test cases is often harder than writing code, and this goes double for good test cases vs good code.

Now obviously there is a difference in testing SQL stored procedures than testing C code, and there may be cases where you can dispense to a small extent with some after-the-fact testing (particularly in declarative programming environments).   After all, you don't have to test what you can prove, but you cannot prove that an existing contract will be maintained into the future.