Perspectives on LedgerSMB: Writing Reports in LedgerSMB 1.4: (Mostly-) Declarative Perl Modules

So far we have talked about getting data in, and interacting with the database. Now we will talk about the largest of the modules and cover workflow scripts in relation with this stage,

At this point you would have a filter screen, a user defined function which would take the arguments from that screen's inputs (prefixed with 'in_' usually to avoid column name conflicts), and a tabular data structure you expect to return.

As a note here, all the code I am trashing here is my own, in part because I have learned a lot about how to code with Moose over the course of 1.4 development.

In your workflow script you are likely to need to add the following:

use LedgerSMB::Report::MyNewReport;

and

sub run_my_new_report {
my ($request) = @_;
LedgerSMB::Report::MyNewReport->new(%$request)->render($request);
}

That's all you need in the workflow script.

Overall Structure and Preamble

The actual Perl module basically defines a number of parameters for the report, and the LedgerSMB::Report.pm provides a general framework to cut down on the amount of code (and knowledge of Perl) required to write a report. Minimally we must, however, define inputs, if any, output structure, and how to create the output structure. We can also define buttons for further actions on the workflow script. The same workflow script would have to handle the button's actions.

Typically a report will start with something like this (of course MyNewReport is the name of the report here):

package LedgerSMB::Report::MyNewReport;
use Moose;
extends 'LedgerSMB::Report';
with 'LedgerSMB::Report::Dates'; # if date inputs used, use standard defs

This preamble sets up the basic reporting framework generally along with all the features discussed below. If you need to handle numeric input or secondary dates you will want to change:

with 'LedgerSMB::Report::Dates';

to

with 'LedgerSMB::Report::Dates', 'LedgerSMB::MooseTypes';

so that you can use type coercions for numeric and/or date fields (for processing localized formattings and the like).

Defining Inputs

Inputs are defined as report properties. Usually you want these properties to be read-only because you want them to correspond to the report actually run. You can use the full Moose capabilities in restricting inputs. However typically inputs should be read-only and you are likely to want to restrict to type and possibly coerce as well (at least when using the types defined in LedgerSMB::MooseTypes).

When including the following line you do not have to define the date_from and date_to inputs:

with 'LedgerSMB::Report::Dates';

Typically our conventions are to document inputs inline with POD. While this is (obviously) not necessary for the functioning of the report, it is helpful for future maintenance and highly recommended. It is also worth noting in the POD how a match is made (this should be in SQL also if applicable, in a COMMENT ON statement for easy checking of common assumptions regarding API contracts).

For example, from the GL report:

=item amount_from

The lowest value that can match, amount-wise.

=item amount_to

The highest value that can match, amount-wise.

=cut

has 'amount_from' => (is => 'rw', coerce => 1,
isa => 'LedgerSMB::Moose::Number');
has 'amount_to' => (is => 'rw', coerce => 1,
isa => 'LedgerSMB::Moose::Number');

Those lines demonstrate the full power of Moose in the definition. One obvious thing that will be fixed in beta is making these read-only (is => 'ro') while they are currently read-write. There is no reason for these to be read-write.

From the LedgerSMB::Report::PNL you see the following optional string input defined:

=item partnumber

This is the control code of the labor/overhead, service, or part consumed.

=cut

has partnumber => (is => 'ro', isa => 'Str', required => 0);

This would probably be improved by mentioning that the partnumber is an exact match in the POD, but it shows how to designate a read-only, optional string input.

If an input is not listed, it won't be passed on to the stored procedure. It is critical that all inputs are defined whether using standard modular definitions (LedgerSMB::Report::Dates) or explicit ones. If an input is being ignored this is one of the first places to check. Additionally note that because of other aspects of the reporting, it is not currently possible to use strict or slurpy constructors in any sane way. It is likely we will build our own constructor handling in the future, but currently this is a hard limitation.

Input Definition Antipatterns

There are a few things which someone who has not worked with Moose before is likely to do in this area, and while many of these are relatively harmless in the web interface because of a number of failsafes, but if you ever want to re-use the code in a more stateful environment you will have difficulties. The examples given are, alas, my own code but I have the benefit of being a new-comer to Moose here and so the lessons are fresh in my mind, or rather codebase.

The first is in use of read-write inputs. A report output is closely bound to its inputs, so read-write inputs allows the application to misrepresent the report. The example I gave above is:

has 'amount_to' => (is => 'rw', coerce => 1,
isa => 'LedgerSMB::Moose::Number');

Now this allows the application to do something like this:

my $report = LedgerSMB::Report::GL->new(%request);
$report->run_report();
$report->amount_to('10000000');
$report->render;

The above will represent that the report includes a bunch of transactions that may, in fact, be excluded. This is no good. On the other hand, if amount_to was read-only (is => 'ro'), then the above code would throw an error instead.

The second major anti-pattern is in the use of Maybe[] as an alternative to required => 0. For example see the following:

has 'approved' => (is => 'rw', isa => 'Maybe[Bool]');

Oh the joys of looking at code I wrote that is in need of rewrite.... Not only do we have a read-write input, but it is maybe boolean (i.e. true, false, or undef).

Now, this appears to work because undef is passed as NULL to the database, and the same is achieved by the more proper:

has approved => (is => 'ro', isa => 'Bool', required => 0);

The difference is that we will not accept as input a case where $request->{approved} = undef has been set. Our query handlers drop empty inputs so there is no case where this should happen. Additionally, this prevents unsetting the attribute after running the report and thus decoupling output from purported input.

Defining Report Structure

Report structure is defined using a series of functions which are overridden by actual reports. Some of these functions are optional and some are not. The required ones are covered first.

There are three required functions, namely columns, header_lines, and name. These are expected to return very specific data structures, but function in a largely declarative way. In other words, the functional interface effectively defines them as pseudo-constant (they are not fully constant because they are expected to return the localized names).

In all cases, LedgerSMB::Report::text() can be used to translate a string into its local equivalent (assuming localized strings in the .po files).

The columns function returns an arrayref of hashrefs, each of which is a column definition for our "dynatable" templates. The following are required:

col_id --- the name of the row field to use
type --- the display type of the field (text, href, checkbox, hidden, etc)
name --- localized header for the column

The following are conditionally required or optional:

href_base --- the base of the href. To this is appended the row_id (see below). Only used by href columns, and then required.
pwidth --- Used for width factors for PDF reports.

Here's an example of a columns function for a very simple report (which just lists all SIC codes in the system):

sub columns {
    return [
      { col_id => 'code',
          type => 'href',
     href_base => 'am.pl?action=edit_sic&code=',
          name => LedgerSMB::Report::text('Code'), },

      { col_id => 'description',
          type => 'text',
          name => LedgerSMB::Report::text('Description'), }
    ];
}

In most reports, the columns function is much longer.

The header_lines function provides an arrayref of hashrefs, for displaying inputs on the report. To this, the reporting engine adds the name of the report and the database connected to. If you want no header lines added, you can just return an empty arrayref:

sub header_lines { return []; }

In many cases however, such inputs should be displayed. Each hashref has two components:

name is the name of the input
text is the text label of the input.

Here's a more useful example from LedgerSMB::Report::GL:

sub header_lines {
    return [{name => 'from_date',
             text => LedgerSMB::Report::text('Start Date')},
            {name => 'to_date',
             text => LedgerSMB::Report::text('End Date')},
            {name => 'accno',
             text => LedgerSMB::Report::text('Account Number')},
            {name => 'reference',
             text => LedgerSMB::Report::text('Reference')},
            {name => 'source',
             text => LedgerSMB::Report::text('Source')}];
}

Finally name() returns the localized name of the report. This is usually a very simple function:

sub name {
    return LedgerSMB::Report::text('General Ledger Report');
}

Additionally there are two optional functions, buttons and template, which allow additional flexibility. These are rarely used.

The template function overrides our dynatable-based template as the template to use. This is used mostly in financial statements but is not used in the trial balance, or other fully tabular reports.

If you use it, just return the path to the template to use.

sub template { return 'Reports/PNL' }

Our PNL reporting module has additional features and beyond the scope of this post.

Finally buttons returns a list of buttons to be included on the report. These follow the element_data format of UI/lib/elements.html and are used to add HTML form callbacks to the report. Here's an example:

sub buttons {
    return [{
         text => LedgerSMB::Report::text('Add New Tax Form'),
         name => 'action',
         type => 'submit',
         class => 'submit'
    }];
}

How Columns are Selected

The columns to display are dynamically selected according to the following rules:

If no column selection criteria is found, then all columns are shown
If the render() method is called with a hashref as arguments that includes a a member with the name of the column ID prefixed with 'col_' then the column is shown and those which are not so selected are not.

What this means is that typically you will define inputs for selection of columns in the $request object before passing it through if you want to have a baseline of columns which always show. Otherwise you will usually allow selection of columns in the filter screen using inputs named as above (i.e. an 'id' field would have a selector named 'col_id').

The run_report() Function

The run_report function populates the rows of the report. It should exist and set $self->rows(@array) at the end. This is the only portion where specific knowledge of programming in Perl is particularly helpful. However, assuming nothing more than a little knowledge, here is a basic template from the SIC listing:

sub run_report{
    my ($self) = @_;
    my @rows = $self->exec_method(funcname => 'sic__list');
    for my $row(@rows){
        $row->{row_id} = $row->{code};
    }
    $self->rows(\@rows);
}

Going through this line by line:

my ($self) = @_;

The first line of the function body is boilerplate here. It is possible to accept the $request object here as a second parameter but not typical unless you have very specific needs for it. In that case, you simply:

my ($self, $request) = @_;

Remember that in Perl, @_ is the argument list.

my @rows = $self->exec_method(funcname => 'sic__list');

This line says "Take run the database function named 'sic__list' and map inputs of the report to function arguments. You would typically just copy this line and change the function name.

for my $row(@rows){
    $row->{row_id} = $row->{code};
}

If you have any hyperlinks in the report, it is necessary to set a row_id so that this can be properly handled. In any case the row_id is appended to the link from href_base in the column definition. It is possible to override this on a per-column basis but that's beyond the scope of this introduction.

$self->rows(\@rows);

This assigns the rows() of the report to the rows returned. Currently this is handled as a read/write property of reports, but long-run this will probably be changed so that programs cannot override this after running the report.

Ending the File

Always end report classes with the following line

__PACKAGE__->meta->make_immutable;

This improves performance and ensures that no more attributes can be dynamically added to your report. There are cases where such may be less than desirable outside of the reports of this sort, but such would be outside the reporting use case.

Others in series:

Introduction
Filter Screens
Best Practices regarding Stored Procedures
(this piece)
Conclusions

Perspectives on LedgerSMB

Wednesday, November 20, 2013

Writing Reports in LedgerSMB 1.4: (Mostly-) Declarative Perl Modules