Wednesday, July 11, 2012

PostgreSQL CTEs and LedgerSMB

LedgerSMB trunk (which will become 1.4) has moved from connectby() to WITH RECURSIVE common table expressions (CTE's).    This post describes our experience, and why we have found that CTE's are an extremely useful way to write queries and why we are moving more and more towards these.  We have started using CTE's more frequently in place of inline views when appropriate as well.

CTEs are first supported in PostgreSQL 8.4.  They represent a way to write a subquery whose results remain stable for the duration of the query.  CTE's can be simple or recursive (for generating tree-like structures via self-joins).  the stability of CTEs both adds opportunities for better tuning a query and performance gotchas, and it is important to remember that the optimizer cannot correlate expected results from a CTE with the rest of the query to optimize the CTE (but presumably can go the other way).

The First Example

The simplest example would be our view, "menu_friendly" which shows the structure and layout of the menu.  The menu is generated using a self-join and this view returns the whole menu.

CREATE VIEW menu_friendly AS
WITH RECURSIVE tree (path, id, parent, level, positions) 

AS (select id::text as path, id, parent, 0 as level, position::text
      FROM menu_node where parent is null
     UNION
    select path || ',' || n.id::text, n.id, n.parent, t.level + 1,
           t.positions || ',' || n.position
      FROM menu_node n
      JOIN tree t ON t.id = n.parent)
SELECT t."level", t.path,
       (repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
        n.id, n."position"
   FROM tree t
   JOIN menu_node n USING (id)
  ORDER BY string_to_array(t.positions, ',')::int[];


This replaced an older view using connectby() from the tablefunc module in contrib:

CREATE VIEW menu_friendly AS
SELECT t."level", t.path, t.list_order,
       (repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
        n.id, n."position"
  FROM (connectby('menu_node'::text, 'id'::text, 'parent'::text,
                  'position'::text, '0'::text, 0, ','::text
        ) t(id integer, parent integer, "level" integer, path text,
        list_order integer)
   JOIN menu_node n USING (id));


The replacement query is longer, it is true,  However it has a number of advantages.  The biggest is that we no longer have to depend on a contrib module which may or may not come with PostgreSQL (most Linux distributions package these in a separate package) and where installation methodology may change (as it did in PostgreSQL 9.1).

I also think there is some value in having the syntax for the self-join a bit more obvious rather than having to run to the tablefunc documentation every time one wants to check.   The code is clearer and this adds value too.

I haven't checked performance between these two examples, because this isn't very performance-sensitive.  After all it is only there for troubleshooting menu structure, and it is only a set of read operations.  Given other tests however I would not expect a performance cost.


Better Performance

LedgerSMB generates the menu for the tree-view in a stored procedure called menu_generate() which takes no arguments.  The 1.3 version is:

CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
        item menu_item;
        arg menu_attribute%ROWTYPE;
BEGIN
        FOR item IN
                SELECT n.position, n.id, c.level, n.label, c.path,
                       to_args(array[ma.attribute, ma.value])
                FROM connectby('menu_node', 'id', 'parent', 'position', '0',
                                0, ',')
                        c(id integer, parent integer, "level" integer,
                                path text, list_order integer)
                JOIN menu_node n USING(id)
                JOIN menu_attribute ma ON (n.id = ma.node_id)
               WHERE n.id IN (select node_id
                                FROM menu_acl
                                JOIN (select rolname FROM pg_roles
                                      UNION
                                     select 'public') pgr
                                     ON pgr.rolname = role_name
                               WHERE pg_has_role(CASE WHEN coalesce(pgr.rolname,
                                                                    'public')
                                                                    = 'public'
                                                      THEN current_user
                                                      ELSE pgr.rolname
                                                   END, 'USAGE')
                            GROUP BY node_id
                              HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
                                                   THEN FALSE
                                                   WHEN acl_type ilike 'ALLOW'
                                                   THEN TRUE
                                                END))
                    or exists (select cn.id, cc.path
                                 FROM connectby('menu_node', 'id', 'parent',
                                                'position', '0', 0, ',')
                                      cc(id integer, parent integer,
                                         "level" integer, path text,
                                         list_order integer)
                                 JOIN menu_node cn USING(id)
                                WHERE cn.id IN
                                      (select node_id FROM menu_acl
                                        JOIN (select rolname FROM pg_roles
                                              UNION
                                              select 'public') pgr
                                              ON pgr.rolname = role_name
                                        WHERE pg_has_role(CASE WHEN 

                                                          coalesce(pgr.rolname,
                                                                    'public')
                                                                    = 'public'
                                                      THEN current_user
                                                      ELSE pgr.rolname
                                                   END, 'USAGE')
                                     GROUP BY node_id
                                       HAVING bool_and(CASE WHEN acl_type
                                                                 ilike 'DENY'
                                                            THEN false
                                                            WHEN acl_type
                                                                 ilike 'ALLOW'
                                                            THEN TRUE
                                                         END))
                                       and cc.path like c.path || ',%')
            GROUP BY n.position, n.id, c.level, n.label, c.path, c.list_order
            ORDER BY c.list_order

        LOOP
                RETURN NEXT item;
        END LOOP;
END;
$$ language plpgsql;


This query performs adequately for users with lots of permissions but it doesn't do so well when a user has few permissions.  Even on decent hardware the query may take several seconds to run if the user only has access to items within one menu section.  In many cases this isn't acceptable performance but on PostgreSQL 8.3 we don't have a real alternative outside of trigger-maintained materialized views.  For PostgreSQL 8.4, however, we can do this:

CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
        item menu_item;
        arg menu_attribute%ROWTYPE;
BEGIN
        FOR item IN
               WITH RECURSIVE tree (path, id, parent, level, positions)
                               AS (select id::text as path, id, parent,
                                           0 as level, position::text
                                      FROM menu_node where parent is null
                                     UNION
                                    select path || ',' || n.id::text, n.id,
                                           n.parent,
                                           t.level + 1,
                                           t.positions || ',' || n.position
                                      FROM menu_node n
                                      JOIN tree t ON t.id = n.parent)
                SELECT n.position, n.id, c.level, n.label, c.path, n.parent,
                       to_args(array[ma.attribute, ma.value])
                FROM tree c
                JOIN menu_node n USING(id)
                JOIN menu_attribute ma ON (n.id = ma.node_id)
               WHERE n.id IN (select node_id
                                FROM menu_acl acl
                          LEFT JOIN pg_roles pr on pr.rolname = acl.role_name
                               WHERE CASE WHEN role_name
                                                           ilike 'public'
                                                      THEN true
                                                      WHEN rolname IS NULL
                                                      THEN FALSE
                                                      ELSE pg_has_role(rolname,
                                                                       'USAGE')
                                      END
                            GROUP BY node_id
                              HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
                                                   THEN FALSE
                                                   WHEN acl_type ilike 'ALLOW'
                                                   THEN TRUE
                                                END))
                    or exists (select cn.id, cc.path
                                 FROM tree cc
                                 JOIN menu_node cn USING(id)
                                WHERE cn.id IN
                                      (select node_id
                                         FROM menu_acl acl
                                    LEFT JOIN pg_roles pr
                                              on pr.rolname = acl.role_name
                                        WHERE CASE WHEN rolname
                                                           ilike 'public'
                                                      THEN true
                                                      WHEN rolname IS NULL
                                                      THEN FALSE
                                                      ELSE pg_has_role(rolname,
                                                                       'USAGE')
                                                END
                                     GROUP BY node_id
                                       HAVING bool_and(CASE WHEN acl_type
                                                                 ilike 'DENY'
                                                            THEN false
                                                            WHEN acl_type
                                                                 ilike 'ALLOW'
                                                            THEN TRUE
                                                         END))
                                       and cc.path::text
                                           like c.path::text || ',%')
            GROUP BY n.position, n.id, c.level, n.label, c.path, c.positions,
                     n.parent
            ORDER BY string_to_array(c.positions, ',')::int[]
        LOOP
                RETURN NEXT item;
        END LOOP;
END;
$$ language plpgsql;


This function is not significantly longer, it is quite a bit clearer, and it performs a lot better.  For users with full permissions it runs twice as fast as the 1.3 version, but for users with few permissions, this runs about 70x faster.  The big improvement comes from the fact that we are only instantiating the menu node tree once.

So not only do we have fewer external dependencies now but there are significant performance improvements.


A Complex Example

CTE's can be used to "materialize" a subquery for a given query.  They can also be nested.  The ordering can be important because this can give you a great deal of control as to what exactly you are joining.

I won't post the full source code for the new trial balance function here.  However, the main query, which contains a nested CTE.

Here we are using the CTE's in order to achieve some stability across joins and avoid join projection issues, as well as creating a tree of projects, departments, etc (business_unit is the table that stores this).

The relevant part of the stored procedure is:

     RETURN QUERY
       WITH ac (transdate, amount, chart_id) AS (
           WITH RECURSIVE bu_tree (id, path) AS (
            SELECT id, id::text AS path
              FROM business_unit
             WHERE parent_id = any(in_business_units)
                   OR (parent_id = IS NULL
                       AND (in_business_units = '{}'
                             OR in_business_units IS NULL))
            UNION
            SELECT bu.id, bu_tree.path || ',' || bu.id
              FROM business_unit bu
              JOIN bu_tree ON bu_tree.id = bu.parent_id
            )
       SELECT ac.transdate, ac.amount, ac.chart_id
         FROM acc_trans ac
         JOIN (SELECT id, approved, department_id FROM ar UNION ALL
               SELECT id, approved, department_id FROM ap UNION ALL
               SELECT id, approved, department_id FROM gl) gl
                   ON ac.approved and gl.approved and ac.trans_id = gl.id
    LEFT JOIN business_unit_ac buac ON ac.entry_id = buac.entry_id
    LEFT JOIN bu_tree ON buac.bu_id = bu_tree.id
        WHERE ac.transdate BETWEEN t_roll_forward + '1 day'::interval
                                    AND t_end_date
              AND ac.trans_id <> ALL(ignore_trans)
              AND (in_department is null
                 or gl.department_id = in_department)
              ((in_business_units = '{}' OR in_business_units IS NULL)
                OR bu_tree.id IS NOT NULL)
       )
       SELECT a.id, a.accno, a.description, a.gifi_accno,
         case when in_date_from is null then 0 else
              CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
              * (coalesce(cp.amount, 0)
              + sum(CASE WHEN ac.transdate <= coalesce(in_date_from,
                                                      t_roll_forward)
                         THEN ac.amount ELSE 0 END)) end,
              sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
                                                         t_roll_forward)
                                                 AND coalesce(in_date_to,
                                                         ac.transdate)
                             AND ac.amount < 0 THEN ac.amount * -1 ELSE 0 END) -
              case when in_date_from is null then coalesce(cp.debits, 0) else 0 end,
              sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
                                                         t_roll_forward)
                                                 AND coalesce(in_date_to,
                                                         ac.transdate)
                             AND ac.amount > 0 THEN ac.amount ELSE 0 END) +
              case when in_date_from is null then coalesce(cp.credits, 0) else 0 end,
              CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
              * (coalesce(cp.amount, 0) + sum(coalesce(ac.amount, 0)))
         FROM account a
    LEFT JOIN ac ON ac.chart_id = a.id
    LEFT JOIN account_checkpoint cp ON cp.account_id = a.id
              AND end_date = t_roll_forward
        WHERE (in_accounts IS NULL OR in_accounts = '{}'
               OR a.id = ANY(in_accounts))
              AND (in_heading IS NULL OR in_heading = a.heading)
     GROUP BY a.id, a.accno, a.description, a.category, a.gifi_accno,
              cp.end_date, cp.account_id, cp.amount, cp.debits, cp.credits
     ORDER BY a.accno;


This query does a lot including rolling balances froward from checkpoints (which also effectively close the books), pulling only transactions which match specific departments or projects, and more.


Conclusion

We have found the CTE to be a tool which is hard to underrate in importance and usefulness.  They have given us maintainability and performance advantages, and they are very flexible.  We will probably move towards using more of them in the future as appropriate.

Saturday, July 7, 2012

Progress towards LedgerSMB 1.4

I have been relatively submerged in work and haven't gotten back to blog as much as I would have liked.   On the whole this is a good thing.

A lot of things are changing in the LedgerSMB community for the better.  For starters there has been a lot of work towards making the site more manageable. We expect this to be a lot better than it has been in the past.

Additionally my business is planning to address important as-yet-unmet needs that must be addressed if open source accounting/ERP software is to reach the mainstream.  I will  probably post more on this later.

Similarly development has accelerated a great deal until the last couple weeks when I have had to focus on other things.  In the next series of posts I expect to talk a bit about how we are utilizing CTE's and the like in LedgerSMB 1.4 and how new features of PostgreSQL are making our lives a lot easier.  I will say that the more I work with CTE's the more I am convinced this is a feature which is highly underrated.    I can't wait until we can start requiring PostgreSQL 9.1 and we can use them in write operations.

I will probably also talk a bit about Moose, why we are moving to it, our experiences, and the like.  Some other discussions like changes to our architecture may be discussed as well.

However for now, I figure a progress report is due.

  • A basic web services framework has been completed and work is ongoing to expose customers and vendors over the framework.
  • Many CRM improvements have been added including lead tracking, an ability to attach documents to customer/vendor records or companies or persons.  Many of these improvements affect employees too.
  • A lot of code and schema clean-up in the CRM and employee management areas
  • Projects and departments have been replaced by a more flexible system of reporting dimensions supporting funds accounting and more
  • A new report framework has been written and many reports have already been moved to it.
To get to 1.4 beta, we basically need to complete payroll (which is not a trivial task) and get customers/vendors working via web services.  We also need to move the financial statements to the new framework.

I would say we are most of the way there. 

Thursday, May 31, 2012

Wonderful Uses of Unit Testing in PostgreSQL DB's

I am sure many readers will read the headline of this post and immediately think of pgTAP, which has become the de facto way to write unit tests in PostgreSQL db's.  The sad part is that we on the LedgerSMB team have not yet gotten around to porting our unit tests to pgTAP.  We have a slightly different approach which works in a similar way.  But this post isn't really about frameworks, but rather applications of the concept of unit testing.

Just to give folks some background, we started adding database-level unit tests to LedgerSMB back in 2007, at a time when pgTAP was still an alpha, and at the time we thought that we should wait for pgTAP to prove itself (pun intended) before relying on it as a testing infrastructure.  It has certainly matured since then.  Our approach instead is to write unit tests in SQL which write to a temporary table called test_result.  At the end of the test run, we select from test_result, provide a report of successes and failures, and roll back the transaction.  Of course if you encounter an exception part way through, your test cases will die and you won't have access to the test results until it is able to run through to completion. 

The test results are then parsed in Perl and the results fed back through Perl's test harness.  This approach has a few advantages and a few disadvantages compared to other fraeworks.  I like the fact that all the results are visible at the end.  It makes debugging easy when there is a failure because you can look at the full pattern of successes and failures and then scroll back to find any debugging information that the developer felt like outputting between test cases.

However what I really want to talk about is how these are or can be used in a mission critical environment.  I think most people would agree that software that tracks money is mission critical at least in terms of its data integrity.  If the data integrity suffers, everything in the business can fall apart quite quickly.  So ensuring data integrity for accounting software that may be customized by third parties is both a hard job and one that is absolutely necessary.

One key aspect of unit tests in this case is that they don't provide relevant lasting impacts on the database they are run against.  While there are some exceptions that are outside transactional control (sequences and the like) usually we don't care so much as to whether those are in the state they would have been for but for the test run.,  The ability to run unit tests on a live system without the possibility of screwing something up makes customization and rapid development possible in environments that it would not otherwise be.  (In actuality we have conventions to offer multiple layers of protection against data corruption.  For example serial values are typically assigned in the test case from a range of negative numbers, thus unlikely to conflict with existing real values.    However, this is nothing compared to the ability to simply roll back the transactions.)

Consequently you can use unit tests on a live system as a measure of ensuring that everything is ok.  A very useful corollary here is that you can use unit tests as a troubleshooting mechanism.  Something not working as expected?  Maybe the first thing that should be done is running the unit tests and see if anything fails.  You can thus add all sorts of checks that are useful for support and deployment purposes.

One key example here is the fact that behavior of LedgerSMB's query mapper is undefined where the target function is overloaded and no arguments defined.    Consequently it is very important to know which functions are overloaded that shouldn't be.   We actually have a test for overloaded functions, and we run this typically against misbehaving databases to see if there are issues there.

Occasionally I have recommended to customers that they wait to install add-ons until unit tests have been written.  This has never yet been the wrong call.

I am a firm believer in unit testing in the db.  They are just useful and not only in a QA environment.  These tests are just as useful as general diagnostic tools frameworks.  Therefore I think every database application should be using them.

Monday, May 21, 2012

LedgerSMB now in Debian!


As Robert James Clay announced today:

 The 1.3.15-2 package for LedgerSMB has been accepted into Debian

unstable.  I expect that version of the package to migrate to Debian

testing (wheezy) by the end of this month.

       LedgerSMB Debian source package page:

http://packages.qa.debian.org/l/ledgersmb.html

       Besides becoming available in the Debian repositories, information

about downloading the current package manually is also available at:

http://packages.debian.org/sid/all/ledgersmb/download

Saturday, May 12, 2012

Thoughts on VoltDB and performance vs flexibility

VoltDB is the latest project of Mike Stonebreaker (original founder of Postgres) aimed at very high performance OLTP.  It is fairly clearly a niche product however, and not really usable in the ERP space for reasons I will discuss below.

Stonebreaker's own presentations (one example, at 11:25) suggest he is aware of this issue given his triangle framework.

In the vertices of the triangle he places NoSQL, high performance OLTP (like VoltDB) and column-oriented databases, while in the second, as lower performance generalists, what he calls the legacy databases or elephants are in the middle.  The argument is that you can get a significant performance boost by specializing your systems beyond what you could when you have every system doing everything.  This is a fair argument to a point, but the implementation details show that it is true only in some areas.

Stonebreaker's analysis of processing time (see above presentation, at 14:36) in traditional databases places virtually all processing time in four areas, namely buffer management, locking, thread locks/semaphores, and recovery tasks, and he suggests that in order to get high degrees of performance one must eliminate these tasks.  This requires, however, rethinking how we address concurrency (the locking issues) and durability (recovery and disk-based storage).  The VoltDB approach is to get rid of concurrency entirely and see durability as a network, rather than a system, property.  Concurrency elimination accepted because the db is fast enough that queries can be run one at a time, and intraquery parallelism can be used instead of interquery parallelism.  However, this imposes significant limitations on the database because it means that every transaction is limited to a single query.  You can't do round-tripping in your transactions because this would impose locking requirements on the database.

This approach works very well for certain kinds of processing, such as consuming rapid data feeds and then feeding that information into a data warehouse at specified intervals.,  However one thing that is generally missing from the discussion is that the more complex the application, the more general the database solution needs to be.  One solution might be to separate data entry from reporting and use multiple tools, but who wants to pull their trial balance from a different system than they enter invoices on?  Since ERP systems usually tightly integrate decision support and OLTP, there isn't really room here to use specialized databases like VoltDB.

As the database market has expanded, it has created niches for products like VoltDB.  These niches may indeed grow with time.  However, I think it is incorrect to treat generalists as legacy, outdated approaches.

In essence VoltDB typifies what I call a specialism paradox, namely that to perform well in a specialist niche one must give up generalist solutions.  It is true that sometimes things improve and supplant older systems but at least as often they only do when the systems are not vastly more complex than the ones they are replacing.  For example, Stonebreaker's comparison of high performance OLTP to the shift from cable-operated digging excavation equipment to  hydrolics misses the fact that hydrolic systems are not that much more complex than cables and such.  There is some additional complexity, but it isn't a huge gap.  A better comparison might be comparing semi trucks to race cars and locomotives.  These are all highly engineered systems, but the basic design considerations are very different, and specialization comes at the cost of flexibility.  You will never see race cars hauling twenty-foot containers of furniture cross-country, and you will never see travelling along our nation's highways to reach out-of-the-way towns.  The semi however will never pull the loads that the locomotive can, and will never win the Indie 500......   The semi truck, however, can go fast enough, and pull the loads it needs to pull.....

Saturday, March 17, 2012

Why LedgerSMB is Headed for Greatness

Edited 2012-03-21 based on feedback from Ned Lilly.

Ok, so this post is going to be about the ERP market, and why LedgerSMB is headed for greatness.  We are a long-standing open source project and are not going anywhere soon.  There are significant strategic reasons to choose LedgerSMB over others.

We have made a lot of progress since we started, and that progress is expected to continue to accelerate.  Here's why....


PostgreSQL

First, we are great because of PostgreSQL.  There is no getting around that.  PostgreSQL is an awesome database management system for ERP's.  While there are some things it doesn't do yet as well as I would like, no other RDBMS does these either (see my recent post about this).

However, what PostgreSQL offers is an extremely powerful set of tools for addressing an ERP environment, and without per connection license costs.  The latter is especially important because it allows for additional utility programs to connect to the database and perform tasks without this requiring additional client access licenses and there is a lot that can be done with the flexibility that results.

Currently we take heavy advantage of PL/PGSQL and various built-in types and functions.   While there has been some talk of using PL/Perl and other stored procedure languages, we haven't yet found a case where this makes sense.

It's worth remembering that most of the work in an ERP system is in storing, retrieving, and displaying information.  The database is the ideal place to put all of the logic regarding storing and retrieving the information, as well as some of the display logic.  Consequently we are becoming more database-heavy and lighter-weight on the Perl side.

Less is More

In LedgerSMB we focus aggressively on overall architecture, with the idea of flexible, re-usable modules which get the logic right the first time around.  For 1.4, we are removing the project and department system and replacing it with a more general "business reporting unit" system.  This allows us to offer project accounting, department reports, manufacturing job costing, and more, without dedicated code to each task.

Differences are then handled by workflow scripts rather than by the model.  This ensure that the application can eventually be a simple yet powerful ERP development environment rather than an application which must be everything to everybody.

Each major release of LedgerSMB has had more features but a smaller code base than the previous version.

This approach is good for a number of reasons.  First it places emphasis on where the emphasis needs to be:  on LedgerSMB as a platform.  Secondly it encourages flexibility in implementation while focusing on getting the details regarding accounting right.  It also gives us fewer maintenance points and hence a better, higher quality codebase.  There are many more reasons why that is the case, but in general we try hard to do more with less.

LedgerSMB is a Multi-Vendor Project

 Most open source ERP's are single-vendor products that make money selling enterprise versions.  This means that the community version is basically a sales tool and many features will not be added for fear of hurting the vendor who sponsors the tool.

LedgerSMB began as a joint project between Metatron Technology Consulting and Command Prompt Inc, with a number of other consultants joining in early.  While the ecosystem is constantly in flux as new individuals enter the project and old ones leave for one reason or another, one theme remains the same:  you can get top-notch support from a variety of sources, and no company retains a monopoly on commit privileges.

This changes the dynamics of the project.  Instead of releasing a moderately functional tool open source and then charging for advanced features, we release most things open source and only where that model breaks down do we look at subscriptions.  This allows us to do things that other open source projects in the area have not yet been able to really try to do in a significant way, such as good working payroll systems with maintained rules for various tax jurisdictions (forthcoming in 1.4 at least for a couple of jurisdictions with more coming).  Instead of paying for fixed asset management, you can save that money and put it towards things that matter, like maintaining payroll logic.  The logic "sold" under subscription can even remain open source, since the major problem is just that of keeping the code up to compliance with the latest regulations.

LedgerSMB is a Development Platform

 LedgerSMB 1.3 and higher isn't intended to be just an ERP application, but rather a development platform for helping businesses implement ERP applications in accordance to their workflows.  The framework while minimalist is designed to facilitate rather than get in the way.

The framework is designed to get your teams of developers up and running as quickly as possible.  The object model is designed to be clean, and most of the actual work is shipped off the the database via stored procedures, or handed off to modules to parse templates, or handed off to various CPAN modules.  The actual Perl code is basically glue that holds all the rest together.

One advantage to this sort of platform is that reference implementations can be created in other platforms as well, and the basic framework can be set up such that it can be used for non-interactive batch jobs or thick clients, not just web applications.  In fact that has been a major guiding feature of our framework.

Secure by Default

LedgerSMB  has made security a top priority since the beginning of the project.  By default users cannot do anything in LedgerSMB, but beyond that even if they find and exploit holes in the web application, the security is still further enforced on the backend.

We are trying hard to provide industry-leading security, and such that can be enforced even when integration with other software is at issue.   We have plans to support Kerberos 5 authentication as soon as it is needed, as well.


Comparisons to xTuple

LedgerSMB and xTuple share a common approach to the database.  We are both heavily stored-procedure centric.   Both of us focus heavily on the database as an engineering discipline in itself.  This means that both of us focus heavily on interoperability with other applications and a good API.


There are however two important differences.  xTuple is a C++/QT application while LedgerSMB is written in Perl.  This means that LedgerSMB can deliver rapid application development possibilities in a way that xTuple cannot.  Rapid application around xTuple would require using other existing frameworks.  Rapid application around LedgerSMB can be done in within the LedgerSMB framework.   This first difference is critical because larger businesses fundamentally buy ERP frameworks as development platforms, while smaller businesses haven't yet found the value in doing this (something LedgerSMB hopes to change).

Secondly, xTuple is a single-vendor solution.  It is offered by a single company, which owns the codebase, and sells add-ons.  This means that the incentive for the company is to release as little as they can and remain viable while selling licenses to as much as they can.  Since LedgerSMB is multi-vendor, the incentive is to provide as much open source as possible and sell as few addons as possible.  One can see already that LedgerSMB comes with fixed asset management for which xTuple charges about $700.  As we continue to develop, those areas where xTuple is currently ahead (such as manufacturing) will probably disappear, and the areas where we offer features that xTuple sells for profit will likely increase.

The above paragraph has been stricken because, as Ned Lilly points out it is inaccurate.  In fact the fixed asset tool was developed by a third party, and this is one area they cultivate around their market.  He points out further that this is an example of rapid application development in Postbooks and therefore I am incorrect there.  He further asks me to suggest that their incentives are not to reduce a minimalistic product.  If their marketplace materials are not mostly made by them, then I would agree on all these points.  Points accepted.

I will however stand by the point that generally single-vendor solutions tend to have incentives to release minimalistic solutions to encourage folks to buy licenses.  Kudos to xTuple for not falling in that trap.

I will also have to look a lot more closely at how this Javascript automation is doable.  It may be possible in essence to do a fair bit more than I had considered here.


Comparisons to OpenERP

OpenERP is an open source, single-vendor ERP solution written in Python.  It features a pleasing interface, a modular structure, and so forth. However the same problem occurs relating to single vendor management.

OpenERP offers rapid application development possibilities both in Python using the framework provided or in it's "application builder."  However, what OpenERP as a business is really interested in selling you is a software license for the full version, which gives you exemptions from the requirements of the AGPL v3 as well as technical support.  This makes it harder, I think, to leverage a larger development community around the software.

Comparisons to OpenTAPS

 OpenTAPS is an ERP application and framework that is multi-vendor, built in Java, and supports multiple databases.  It is based on the Apache Open for Business ERP framework.

OpenTAPS is impressive, features-wise, and it is used by some major corporations, but the choice of Java as a language means that development is less rapid than it would be in Perl or Python, and therefore the framework capabilities are more within the reach of large businesses, which likely have internal development teams familiar with Java, than small ones.  This means that small businesses are still faced with the necessity of building business processes around the application rather than the other way around.  It is therefore more of an upper-market product, while LedgerSMB currently shoots primarily for the midrange.

Tuesday, March 13, 2012

3 Things I Wish PostgreSQL Did

There are a few times when I work with PostgreSQL when I wish it was able to do some things that neither it nor other relational databases are able to do at present.  In my view (and I will defend each of these feature requests), these all fit well within a relational db centric world.


3:  XML/JSON to tuple type

One of the problems that one runs into in a stored-procedure centric application is managing input and output of an application.   An ability to convert XML or JSON to a tuple type (remember that tuples can have members that are tuples, or arrays of tuples).  The formats are semantically equivalent, so why not allow for conversion?


2:  Nested Namespaces

Managing hundreds or thousands of stored procedures  can be a problem in a flat namespace.  We end up having to semantically create hierarchical function names or hierarchically named namespaces in a flat namespace world.  Even if one cannot do it for tables, there ought to be some way to do this for functions.

Part of the problem here is that the SQL standards give semantic value to different namespace lengths, so this would have to be solved.  It is not an easy problem to solve and it may not have a solution.  Perhaps a package delimiter within a namespace would be helpful?  Maybe a character like# or $?

1:  Rich Declarative Constraints for Accounting Applications

It seems strange to me that accounting applications have been one of the primary uses of RDBMS's since their inception, and yet there is no type of declarative constraint to handle ensuring that transactions are balanced.

Whether a transaction is balanced or not is fundamentally a set-based operation.  A transaction is balanced when the sum of the debits of the rows in each transaction is equal to the sum of hte credits of the rows in each transaction.  In LedgerSMB, debits are negative amounts, and credits are positive amounts (they are just presented to the user as fundamentally different).  So I would like the ability to do something like:

CHECK FOR EACH TRANSACTION (SUM(amount) = 0) GROUP BY trans_id;

Part of the problem I suppose is that getting this right in a row-locking environment is hard.  However, in an MVCC environment it should be possible to check if this constraint matches the deleted and inserted rows, and if it does (0 + 0 = 0), we know we are balanced (deletions would only happen for unapproved transactions, i.e. ones that have not yet hit the books).  Such a check would only fire once per transaction, and only check rows modified by the transaction.  It therefore shouldn't be problematic in the way an aggregate check would be over the entire table.

The other two are nice, but if PostgreSQL could do this very well, it would be the king of databases for accounting software, and I am sure they feature would find new uses elsewhere.

So, what's on your list?