So I have decided to begin work on a set of CPAN modules, essentially implementing the stored procedure interface work that I have done for LedgerSMB, and incorporating the lessons learned by myself and others. It is my hope that Perl programmers will find this useful in building sophisticated applications on PostgreSQL. The idea is to turn PostgreSQL into an encapsulated database sever with a discoverable API, and then allow programs to discover the API directly. In my experience this leads to more robust code because code contracts between stored procedures and applications are more readily enforcible.
I just got preliminary versions of two of them on CPAN. The first is PGObject which implements the database-facing routines, and PGObject::Simple which is developer-facing. The reason for breaking things off this way is to allow for multiple different developer-facing systems off the same backend-facing services. These are functional, but ones with a great deal more functionality are on their way.
I feel a need to justify "Simple" in the name of the latter. PGObject::Simple is a quick and dirty stored procedure mapping convention system based on our approach in LedgerSMB. It is not reduced in functionality, but the overall conventions are simple, and therefore require less overhead to use. This also means, however, that the flexibility is such that it requires a certain discipline to use effectively. Objects can quickly become amorphous but used with discipline it is a very powerful system.
For additional structure, PGObject::Simple::Role is under development which creates a Moo role, which can be used with or without Moose, to provide this sort of functionality in more formalized object systems.
These modules are designed to facilitate mapping of stored procedure functionality into applications, allowing PostgreSQL databases to become encapsulated intelligent information servers in their own right. In future posts I will be going over both of these in detail.
For now though start with the documentation in the links above.
This blog tracks development of the open source accounting and ERP software LedgerSMB. I also offer some perspectives on PostgreSQL including new features which we may find useful. Brought to you by Metatron Technology Consulting.
Monday, May 27, 2013
Wednesday, May 1, 2013
Arrays and Indexing, problems and solutions
Arrays in PostgreSQL are relatively well supported by the indexing engines of this RDBMS. There are a fair number of options however to be aware of and an understanding of how arrays are indexed is very helpful. Before going forward, please read my intro on arrays so that you understand the basics.
I am leaving out hash indexes which are not very useful for arrays (or much of anything else). There are probably cases where hash indexes are useful but if you need them you will know....
The default type of index used by PostgreSQL is a btree index, which provides a search tree using only three operators: <, =, >. Ordering and ordinality of arrays here is quite important and btree indexes are essentially useless when indexing arrays where ordinality is unimportant.
One should remember that btree indexes are useful primarily in ordering based searches. The following could use a btree index (and might depending on table statistics):
The following cannot work with a btree index under any circumstances:
Btree indexes may be useful in some cases with arrays, particularly where ordinality is very important to the search. However, they are entirely useless where ordinality is unimportant.
PostgreSQL provides two more general indexing opportunities that work for more general (and non-1NF uses) of arrays than do Btree indexes.
In general the tradeoff is that GIN indexes are more costly to maintain and take up more disk space but are less costly to query. As the read to write ratio goes up, GiST indexes start looking more desirable and GIN indexes start looking less desirable.
These indexes are helpful because they suppose a wider range of searches than do BTree indexes. In particular you can do overlap searches (select * from pages where tags && array['PostgreSQL']), and much more. GiST and GIN indexes also support other operators. Also it is worth noting that GiST indexes are originally designed for geometric and geospacial searches, while GIN indexes are designed for arrays.
The supported GiST operators are:
The supported GIN operators are:
On the whole the two most important indexes depending on what you are doing with the array are btree and GIN indexes. Btrees cover well the case where ordinality matters and the search is primarily on the beginning of the array. GIN indexes cover more general (and in particular non-1NF setups) relatively well. GiST indexes seem nice, but on the whole they do not offer a good performance/functionality tradeoff out of the box. They would be powerful tools however for the advanced developer.
I am leaving out hash indexes which are not very useful for arrays (or much of anything else). There are probably cases where hash indexes are useful but if you need them you will know....
BTREE Indexing for Ordinality-centric arrays
The default type of index used by PostgreSQL is a btree index, which provides a search tree using only three operators: <, =, >. Ordering and ordinality of arrays here is quite important and btree indexes are essentially useless when indexing arrays where ordinality is unimportant.
One should remember that btree indexes are useful primarily in ordering based searches. The following could use a btree index (and might depending on table statistics):
- SELECT id FROM mytable WHERE myarray < '{0,2,1,2,3}';
- SELECT id FROM mytable WHERE myarray = '{0,6,5,4,3}';
- SELECT id FROM mytable WHERE myarray[1] = '5';
The following cannot work with a btree index under any circumstances:
- SELECT id FROM mytable WHERE myarray[2] = 5;
Btree indexes may be useful in some cases with arrays, particularly where ordinality is very important to the search. However, they are entirely useless where ordinality is unimportant.
GiST and GIN indexing
PostgreSQL provides two more general indexing opportunities that work for more general (and non-1NF uses) of arrays than do Btree indexes.
In general the tradeoff is that GIN indexes are more costly to maintain and take up more disk space but are less costly to query. As the read to write ratio goes up, GiST indexes start looking more desirable and GIN indexes start looking less desirable.
These indexes are helpful because they suppose a wider range of searches than do BTree indexes. In particular you can do overlap searches (select * from pages where tags && array['PostgreSQL']), and much more. GiST and GIN indexes also support other operators. Also it is worth noting that GiST indexes are originally designed for geometric and geospacial searches, while GIN indexes are designed for arrays.
The supported GiST operators are:
- << Strictly left of
- &< Extends to the left of
- &> Extends to the right of
- >> Strictly right of
- <<| Strictly Below
- &<| Does not extend above
- |&> Does not extend below
- |>> Strictly above
- @> Contains
- <@ Contained in
- ~= Is the same as (identical to)
- && Overlaps
The supported GIN operators are:
- <@ Is contained in
- @> Contains
- = equals
- && Overlaps
On the whole the two most important indexes depending on what you are doing with the array are btree and GIN indexes. Btrees cover well the case where ordinality matters and the search is primarily on the beginning of the array. GIN indexes cover more general (and in particular non-1NF setups) relatively well. GiST indexes seem nice, but on the whole they do not offer a good performance/functionality tradeoff out of the box. They would be powerful tools however for the advanced developer.