CTEs are first supported in PostgreSQL 8.4. They represent a way to write a subquery whose results remain stable for the duration of the query. CTE's can be simple or recursive (for generating tree-like structures via self-joins). the stability of CTEs both adds opportunities for better tuning a query and performance gotchas, and it is important to remember that the optimizer cannot correlate expected results from a CTE with the rest of the query to optimize the CTE (but presumably can go the other way).
The First Example
The simplest example would be our view, "menu_friendly" which shows the structure and layout of the menu. The menu is generated using a self-join and this view returns the whole menu.
CREATE VIEW menu_friendly AS
WITH RECURSIVE tree (path, id, parent, level, positions)
AS (select id::text as path, id, parent, 0 as level, position::text
FROM menu_node where parent is null
UNION
select path || ',' || n.id::text, n.id, n.parent, t.level + 1,
t.positions || ',' || n.position
FROM menu_node n
JOIN tree t ON t.id = n.parent)
SELECT t."level", t.path,
(repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
n.id, n."position"
FROM tree t
JOIN menu_node n USING (id)
ORDER BY string_to_array(t.positions, ',')::int[];
This replaced an older view using connectby() from the tablefunc module in contrib:
SELECT t."level", t.path, t.list_order,
(repeat(' '::text, (2 * t."level")) || (n.label)::text) AS label,
n.id, n."position"
FROM (connectby('menu_node'::text, 'id'::text, 'parent'::text,
'position'::text, '0'::text, 0, ','::text
) t(id integer, parent integer, "level" integer, path text,
list_order integer)
JOIN menu_node n USING (id));
The replacement query is longer, it is true, However it has a number of advantages. The biggest is that we no longer have to depend on a contrib module which may or may not come with PostgreSQL (most Linux distributions package these in a separate package) and where installation methodology may change (as it did in PostgreSQL 9.1).
I also think there is some value in having the syntax for the self-join a bit more obvious rather than having to run to the tablefunc documentation every time one wants to check. The code is clearer and this adds value too.
I haven't checked performance between these two examples, because this isn't very performance-sensitive. After all it is only there for troubleshooting menu structure, and it is only a set of read operations. Given other tests however I would not expect a performance cost.
Better Performance
LedgerSMB generates the menu for the tree-view in a stored procedure called menu_generate() which takes no arguments. The 1.3 version is:
CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
item menu_item;
arg menu_attribute%ROWTYPE;
BEGIN
FOR item IN
SELECT n.position, n.id, c.level, n.label, c.path,
to_args(array[ma.attribute, ma.value])
FROM connectby('menu_node', 'id', 'parent', 'position', '0',
0, ',')
c(id integer, parent integer, "level" integer,
path text, list_order integer)
JOIN menu_node n USING(id)
JOIN menu_attribute ma ON (n.id = ma.node_id)
WHERE n.id IN (select node_id
FROM menu_acl
JOIN (select rolname FROM pg_roles
UNION
select 'public') pgr
ON pgr.rolname = role_name
WHERE pg_has_role(CASE WHEN coalesce(pgr.rolname,
'public')
= 'public'
THEN current_user
ELSE pgr.rolname
END, 'USAGE')
GROUP BY node_id
HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
THEN FALSE
WHEN acl_type ilike 'ALLOW'
THEN TRUE
END))
or exists (select cn.id, cc.path
FROM connectby('menu_node', 'id', 'parent',
'position', '0', 0, ',')
cc(id integer, parent integer,
"level" integer, path text,
list_order integer)
JOIN menu_node cn USING(id)
WHERE cn.id IN
(select node_id FROM menu_acl
JOIN (select rolname FROM pg_roles
UNION
select 'public') pgr
ON pgr.rolname = role_name
WHERE pg_has_role(CASE WHEN
coalesce(pgr.rolname,
'public')
= 'public'
THEN current_user
ELSE pgr.rolname
END, 'USAGE')
GROUP BY node_id
HAVING bool_and(CASE WHEN acl_type
ilike 'DENY'
THEN false
WHEN acl_type
ilike 'ALLOW'
THEN TRUE
END))
and cc.path like c.path || ',%')
GROUP BY n.position, n.id, c.level, n.label, c.path, c.list_order
ORDER BY c.list_order
LOOP
RETURN NEXT item;
END LOOP;
END;
$$ language plpgsql;
This query performs adequately for users with lots of permissions but it doesn't do so well when a user has few permissions. Even on decent hardware the query may take several seconds to run if the user only has access to items within one menu section. In many cases this isn't acceptable performance but on PostgreSQL 8.3 we don't have a real alternative outside of trigger-maintained materialized views. For PostgreSQL 8.4, however, we can do this:
CREATE OR REPLACE FUNCTION menu_generate() RETURNS SETOF menu_item AS
$$
DECLARE
item menu_item;
arg menu_attribute%ROWTYPE;
BEGIN
FOR item IN
WITH RECURSIVE tree (path, id, parent, level, positions)
AS (select id::text as path, id, parent,
0 as level, position::text
FROM menu_node where parent is null
UNION
select path || ',' || n.id::text, n.id,
n.parent,
t.level + 1,
t.positions || ',' || n.position
FROM menu_node n
JOIN tree t ON t.id = n.parent)
SELECT n.position, n.id, c.level, n.label, c.path, n.parent,
to_args(array[ma.attribute, ma.value])
FROM tree c
JOIN menu_node n USING(id)
JOIN menu_attribute ma ON (n.id = ma.node_id)
WHERE n.id IN (select node_id
FROM menu_acl acl
LEFT JOIN pg_roles pr on pr.rolname = acl.role_name
WHERE CASE WHEN role_name
ilike 'public'
THEN true
WHEN rolname IS NULL
THEN FALSE
ELSE pg_has_role(rolname,
'USAGE')
END
GROUP BY node_id
HAVING bool_and(CASE WHEN acl_type ilike 'DENY'
THEN FALSE
WHEN acl_type ilike 'ALLOW'
THEN TRUE
END))
or exists (select cn.id, cc.path
FROM tree cc
JOIN menu_node cn USING(id)
WHERE cn.id IN
(select node_id
FROM menu_acl acl
LEFT JOIN pg_roles pr
on pr.rolname = acl.role_name
WHERE CASE WHEN rolname
ilike 'public'
THEN true
WHEN rolname IS NULL
THEN FALSE
ELSE pg_has_role(rolname,
'USAGE')
END
GROUP BY node_id
HAVING bool_and(CASE WHEN acl_type
ilike 'DENY'
THEN false
WHEN acl_type
ilike 'ALLOW'
THEN TRUE
END))
and cc.path::text
like c.path::text || ',%')
GROUP BY n.position, n.id, c.level, n.label, c.path, c.positions,
n.parent
ORDER BY string_to_array(c.positions, ',')::int[]
LOOP
RETURN NEXT item;
END LOOP;
END;
$$ language plpgsql;
This function is not significantly longer, it is quite a bit clearer, and it performs a lot better. For users with full permissions it runs twice as fast as the 1.3 version, but for users with few permissions, this runs about 70x faster. The big improvement comes from the fact that we are only instantiating the menu node tree once.
So not only do we have fewer external dependencies now but there are significant performance improvements.
A Complex Example
CTE's can be used to "materialize" a subquery for a given query. They can also be nested. The ordering can be important because this can give you a great deal of control as to what exactly you are joining.
I won't post the full source code for the new trial balance function here. However, the main query, which contains a nested CTE.
Here we are using the CTE's in order to achieve some stability across joins and avoid join projection issues, as well as creating a tree of projects, departments, etc (business_unit is the table that stores this).
The relevant part of the stored procedure is:
RETURN QUERY
WITH ac (transdate, amount, chart_id) AS (
WITH RECURSIVE bu_tree (id, path) AS (
SELECT id, id::text AS path
FROM business_unit
WHERE parent_id = any(in_business_units)
OR (parent_id = IS NULL
AND (in_business_units = '{}'
OR in_business_units IS NULL))
UNION
SELECT bu.id, bu_tree.path || ',' || bu.id
FROM business_unit bu
JOIN bu_tree ON bu_tree.id = bu.parent_id
)
SELECT ac.transdate, ac.amount, ac.chart_id
FROM acc_trans ac
JOIN (SELECT id, approved, department_id FROM ar UNION ALL
SELECT id, approved, department_id FROM ap UNION ALL
SELECT id, approved, department_id FROM gl) gl
ON ac.approved and gl.approved and ac.trans_id = gl.id
LEFT JOIN business_unit_ac buac ON ac.entry_id = buac.entry_id
LEFT JOIN bu_tree ON buac.bu_id = bu_tree.id
WHERE ac.transdate BETWEEN t_roll_forward + '1 day'::interval
AND t_end_date
AND ac.trans_id <> ALL(ignore_trans)
AND (in_department is null
or gl.department_id = in_department)
((in_business_units = '{}' OR in_business_units IS NULL)
OR bu_tree.id IS NOT NULL)
)
SELECT a.id, a.accno, a.description, a.gifi_accno,
case when in_date_from is null then 0 else
CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
* (coalesce(cp.amount, 0)
+ sum(CASE WHEN ac.transdate <= coalesce(in_date_from,
t_roll_forward)
THEN ac.amount ELSE 0 END)) end,
sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
t_roll_forward)
AND coalesce(in_date_to,
ac.transdate)
AND ac.amount < 0 THEN ac.amount * -1 ELSE 0 END) -
case when in_date_from is null then coalesce(cp.debits, 0) else 0 end,
sum(CASE WHEN ac.transdate BETWEEN coalesce(in_date_from,
t_roll_forward)
AND coalesce(in_date_to,
ac.transdate)
AND ac.amount > 0 THEN ac.amount ELSE 0 END) +
case when in_date_from is null then coalesce(cp.credits, 0) else 0 end,
CASE WHEN a.category IN ('A', 'E') THEN -1 ELSE 1 END
* (coalesce(cp.amount, 0) + sum(coalesce(ac.amount, 0)))
FROM account a
LEFT JOIN ac ON ac.chart_id = a.id
LEFT JOIN account_checkpoint cp ON cp.account_id = a.id
AND end_date = t_roll_forward
WHERE (in_accounts IS NULL OR in_accounts = '{}'
OR a.id = ANY(in_accounts))
AND (in_heading IS NULL OR in_heading = a.heading)
GROUP BY a.id, a.accno, a.description, a.category, a.gifi_accno,
cp.end_date, cp.account_id, cp.amount, cp.debits, cp.credits
ORDER BY a.accno;
This query does a lot including rolling balances froward from checkpoints (which also effectively close the books), pulling only transactions which match specific departments or projects, and more.
Conclusion
We have found the CTE to be a tool which is hard to underrate in importance and usefulness. They have given us maintainability and performance advantages, and they are very flexible. We will probably move towards using more of them in the future as appropriate.
No comments:
Post a Comment