Wednesday, July 11, 2012

PostgreSQL CTEs and LedgerSMB

LedgerSMB trunk (which will become 1.4) has moved from connectby() to WITH RECURSIVE common table expressions (CTE's).    This post describes our experience, and why we have found that CTE's are an extremely useful way to write queries and why we are moving more and more towards these.  We have started using CTE's more frequently in place of inline views when appropriate as well.

CTEs are first supported in PostgreSQL 8.4.  They represent a way to write a subquery whose results remain stable for the duration of the query.  CTE's can be simple or recursive (for generating tree-like structures via self-joins).  the stability of CTEs both adds opportunities for better tuning a query and performance gotchas, and it is important to remember that the optimizer cannot correlate expected results from a CTE with the rest of the query to optimize the CTE (but presumably can go the other way).

The First Example

The simplest example would be our view, "menu_friendly" which shows the structure and layout of the menu.  The menu is generated using a self-join and this view returns the whole menu.

CREATE VIEW menu_friendly AS
WITH RECURSIVE tree (path, id, parent, level, positions) 

AS (select id::text as path, id, parent, 0 as level, position::text
      FROM menu_node where parent is null
     UNION
    select path || ',' || n.id::text, n.id, n.parent, t.level + 1,
           t.positions || ',' || n.position
      FROM menu_node n
      JOIN tree t ON t.id = n.parent)
SELECT t."level", t.path,
       (repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
        n.id, n."position"
   FROM tree t
   JOIN menu_node n USING (id)
  ORDER BY string_to_array(t.positions, ',')::int[];


This replaced an older view using connectby() from the tablefunc module in contrib:

CREATE VIEW menu_friendly AS
SELECT t."level", t.path, t.list_order,
       (repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
        n.id, n."position"
  FROM (connectby('menu_node'::text, 'id'::text, 'parent'::text,
                  'position'::text, '0'::text, 0, ','::text
        ) t(id integer, parent integer, "level" integer, path text,
        list_order integer)
   JOIN menu_node n USING (id));


The replacement query is longer, it is true,  However it has a number of advantages.  The biggest is that we no longer have to depend on a contrib module which may or may not come with PostgreSQL (most Linux distributions package these in a separate package) and where installation methodology may change (as it did in PostgreSQL 9.1).

I also think there is some value in having the syntax for the self-join a bit more obvious rather than having to run to the tablefunc documentation every time one wants to check.   The code is clearer and this adds value too.

I haven't checked performance between these two examples, because this isn't very performance-sensitive.  After all it is only there for troubleshooting menu structure, and it is only a set of read operations.  Given other tests however I would not expect a performance cost.


Better Performance

LedgerSMB generates the menu for the tree-view in a stored procedure called menu_generate() which takes no arguments.  The 1.3 version is:

CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
        item menu_item;
        arg menu_attribute%ROWTYPE;
BEGIN
        FOR item IN
                SELECT n.position, n.id, c.level, n.label, c.path,
                       to_args(array[ma.attribute, ma.value])
                FROM connectby('menu_node', 'id', 'parent', 'position', '0',
                                0, ',')
                        c(id integer, parent integer, "level" integer,
                                path text, list_order integer)
                JOIN menu_node n USING(id)
                JOIN menu_attribute ma ON (n.id = ma.node_id)
               WHERE n.id IN (select node_id
                                FROM menu_acl
                                JOIN (select rolname FROM pg_roles
                                      UNION
                                     select 'public') pgr
                                     ON pgr.rolname = role_name
                               WHERE pg_has_role(CASE WHEN coalesce(pgr.rolname,
                                                                    'public')
                                                                    = 'public'
                                                      THEN current_user
                                                      ELSE pgr.rolname
                                                   END, 'USAGE')
                            GROUP BY node_id
                              HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
                                                   THEN FALSE
                                                   WHEN acl_type ilike 'ALLOW'
                                                   THEN TRUE
                                                END))
                    or exists (select cn.id, cc.path
                                 FROM connectby('menu_node', 'id', 'parent',
                                                'position', '0', 0, ',')
                                      cc(id integer, parent integer,
                                         "level" integer, path text,
                                         list_order integer)
                                 JOIN menu_node cn USING(id)
                                WHERE cn.id IN
                                      (select node_id FROM menu_acl
                                        JOIN (select rolname FROM pg_roles
                                              UNION
                                              select 'public') pgr
                                              ON pgr.rolname = role_name
                                        WHERE pg_has_role(CASE WHEN 

                                                          coalesce(pgr.rolname,
                                                                    'public')
                                                                    = 'public'
                                                      THEN current_user
                                                      ELSE pgr.rolname
                                                   END, 'USAGE')
                                     GROUP BY node_id
                                       HAVING bool_and(CASE WHEN acl_type
                                                                 ilike 'DENY'
                                                            THEN false
                                                            WHEN acl_type
                                                                 ilike 'ALLOW'
                                                            THEN TRUE
                                                         END))
                                       and cc.path like c.path || ',%')
            GROUP BY n.position, n.id, c.level, n.label, c.path, c.list_order
            ORDER BY c.list_order

        LOOP
                RETURN NEXT item;
        END LOOP;
END;
$$ language plpgsql;


This query performs adequately for users with lots of permissions but it doesn't do so well when a user has few permissions.  Even on decent hardware the query may take several seconds to run if the user only has access to items within one menu section.  In many cases this isn't acceptable performance but on PostgreSQL 8.3 we don't have a real alternative outside of trigger-maintained materialized views.  For PostgreSQL 8.4, however, we can do this:

CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
        item menu_item;
        arg menu_attribute%ROWTYPE;
BEGIN
        FOR item IN
               WITH RECURSIVE tree (path, id, parent, level, positions)
                               AS (select id::text as path, id, parent,
                                           0 as level, position::text
                                      FROM menu_node where parent is null
                                     UNION
                                    select path || ',' || n.id::text, n.id,
                                           n.parent,
                                           t.level + 1,
                                           t.positions || ',' || n.position
                                      FROM menu_node n
                                      JOIN tree t ON t.id = n.parent)
                SELECT n.position, n.id, c.level, n.label, c.path, n.parent,
                       to_args(array[ma.attribute, ma.value])
                FROM tree c
                JOIN menu_node n USING(id)
                JOIN menu_attribute ma ON (n.id = ma.node_id)
               WHERE n.id IN (select node_id
                                FROM menu_acl acl
                          LEFT JOIN pg_roles pr on pr.rolname = acl.role_name
                               WHERE CASE WHEN role_name
                                                           ilike 'public'
                                                      THEN true
                                                      WHEN rolname IS NULL
                                                      THEN FALSE
                                                      ELSE pg_has_role(rolname,
                                                                       'USAGE')
                                      END
                            GROUP BY node_id
                              HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
                                                   THEN FALSE
                                                   WHEN acl_type ilike 'ALLOW'
                                                   THEN TRUE
                                                END))
                    or exists (select cn.id, cc.path
                                 FROM tree cc
                                 JOIN menu_node cn USING(id)
                                WHERE cn.id IN
                                      (select node_id
                                         FROM menu_acl acl
                                    LEFT JOIN pg_roles pr
                                              on pr.rolname = acl.role_name
                                        WHERE CASE WHEN rolname
                                                           ilike 'public'
                                                      THEN true
                                                      WHEN rolname IS NULL
                                                      THEN FALSE
                                                      ELSE pg_has_role(rolname,
                                                                       'USAGE')
                                                END
                                     GROUP BY node_id
                                       HAVING bool_and(CASE WHEN acl_type
                                                                 ilike 'DENY'
                                                            THEN false
                                                            WHEN acl_type
                                                                 ilike 'ALLOW'
                                                            THEN TRUE
                                                         END))
                                       and cc.path::text
                                           like c.path::text || ',%')
            GROUP BY n.position, n.id, c.level, n.label, c.path, c.positions,
                     n.parent
            ORDER BY string_to_array(c.positions, ',')::int[]
        LOOP
                RETURN NEXT item;
        END LOOP;
END;
$$ language plpgsql;


This function is not significantly longer, it is quite a bit clearer, and it performs a lot better.  For users with full permissions it runs twice as fast as the 1.3 version, but for users with few permissions, this runs about 70x faster.  The big improvement comes from the fact that we are only instantiating the menu node tree once.

So not only do we have fewer external dependencies now but there are significant performance improvements.


A Complex Example

CTE's can be used to "materialize" a subquery for a given query.  They can also be nested.  The ordering can be important because this can give you a great deal of control as to what exactly you are joining.

I won't post the full source code for the new trial balance function here.  However, the main query, which contains a nested CTE.

Here we are using the CTE's in order to achieve some stability across joins and avoid join projection issues, as well as creating a tree of projects, departments, etc (business_unit is the table that stores this).

The relevant part of the stored procedure is:

     RETURN QUERY
       WITH ac (transdate, amount, chart_id) AS (
           WITH RECURSIVE bu_tree (id, path) AS (
            SELECT id, id::text AS path
              FROM business_unit
             WHERE parent_id = any(in_business_units)
                   OR (parent_id = IS NULL
                       AND (in_business_units = '{}'
                             OR in_business_units IS NULL))
            UNION
            SELECT bu.id, bu_tree.path || ',' || bu.id
              FROM business_unit bu
              JOIN bu_tree ON bu_tree.id = bu.parent_id
            )
       SELECT ac.transdate, ac.amount, ac.chart_id
         FROM acc_trans ac
         JOIN (SELECT id, approved, department_id FROM ar UNION ALL
               SELECT id, approved, department_id FROM ap UNION ALL
               SELECT id, approved, department_id FROM gl) gl
                   ON ac.approved and gl.approved and ac.trans_id = gl.id
    LEFT JOIN business_unit_ac buac ON ac.entry_id = buac.entry_id
    LEFT JOIN bu_tree ON buac.bu_id = bu_tree.id
        WHERE ac.transdate BETWEEN t_roll_forward + '1 day'::interval
                                    AND t_end_date
              AND ac.trans_id <> ALL(ignore_trans)
              AND (in_department is null
                 or gl.department_id = in_department)
              ((in_business_units = '{}' OR in_business_units IS NULL)
                OR bu_tree.id IS NOT NULL)
       )
       SELECT a.id, a.accno, a.description, a.gifi_accno,
         case when in_date_from is null then 0 else
              CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
              * (coalesce(cp.amount, 0)
              + sum(CASE WHEN ac.transdate <= coalesce(in_date_from,
                                                      t_roll_forward)
                         THEN ac.amount ELSE 0 END)) end,
              sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
                                                         t_roll_forward)
                                                 AND coalesce(in_date_to,
                                                         ac.transdate)
                             AND ac.amount < 0 THEN ac.amount * -1 ELSE 0 END) -
              case when in_date_from is null then coalesce(cp.debits, 0) else 0 end,
              sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
                                                         t_roll_forward)
                                                 AND coalesce(in_date_to,
                                                         ac.transdate)
                             AND ac.amount > 0 THEN ac.amount ELSE 0 END) +
              case when in_date_from is null then coalesce(cp.credits, 0) else 0 end,
              CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
              * (coalesce(cp.amount, 0) + sum(coalesce(ac.amount, 0)))
         FROM account a
    LEFT JOIN ac ON ac.chart_id = a.id
    LEFT JOIN account_checkpoint cp ON cp.account_id = a.id
              AND end_date = t_roll_forward
        WHERE (in_accounts IS NULL OR in_accounts = '{}'
               OR a.id = ANY(in_accounts))
              AND (in_heading IS NULL OR in_heading = a.heading)
     GROUP BY a.id, a.accno, a.description, a.category, a.gifi_accno,
              cp.end_date, cp.account_id, cp.amount, cp.debits, cp.credits
     ORDER BY a.accno;


This query does a lot including rolling balances froward from checkpoints (which also effectively close the books), pulling only transactions which match specific departments or projects, and more.


Conclusion

We have found the CTE to be a tool which is hard to underrate in importance and usefulness.  They have given us maintainability and performance advantages, and they are very flexible.  We will probably move towards using more of them in the future as appropriate.

No comments:

Post a Comment