Perspectives on LedgerSMB: 3 Things I Wish PostgreSQL Did

Tuesday, March 13, 2012

3 Things I Wish PostgreSQL Did

There are a few times when I work with PostgreSQL when I wish it was able to do some things that neither it nor other relational databases are able to do at present. In my view (and I will defend each of these feature requests), these all fit well within a relational db centric world.

3: XML/JSON to tuple type

One of the problems that one runs into in a stored-procedure centric application is managing input and output of an application. An ability to convert XML or JSON to a tuple type (remember that tuples can have members that are tuples, or arrays of tuples). The formats are semantically equivalent, so why not allow for conversion?

2: Nested Namespaces

Managing hundreds or thousands of stored procedures can be a problem in a flat namespace. We end up having to semantically create hierarchical function names or hierarchically named namespaces in a flat namespace world. Even if one cannot do it for tables, there ought to be some way to do this for functions.

Part of the problem here is that the SQL standards give semantic value to different namespace lengths, so this would have to be solved. It is not an easy problem to solve and it may not have a solution. Perhaps a package delimiter within a namespace would be helpful? Maybe a character like# or $?

1: Rich Declarative Constraints for Accounting Applications

It seems strange to me that accounting applications have been one of the primary uses of RDBMS's since their inception, and yet there is no type of declarative constraint to handle ensuring that transactions are balanced.

Whether a transaction is balanced or not is fundamentally a set-based operation. A transaction is balanced when the sum of the debits of the rows in each transaction is equal to the sum of hte credits of the rows in each transaction. In LedgerSMB, debits are negative amounts, and credits are positive amounts (they are just presented to the user as fundamentally different). So I would like the ability to do something like:

CHECK FOR EACH TRANSACTION (SUM(amount) = 0) GROUP BY trans_id;

Part of the problem I suppose is that getting this right in a row-locking environment is hard. However, in an MVCC environment it should be possible to check if this constraint matches the deleted and inserted rows, and if it does (0 + 0 = 0), we know we are balanced (deletions would only happen for unapproved transactions, i.e. ones that have not yet hit the books). Such a check would only fire once per transaction, and only check rows modified by the transaction. It therefore shouldn't be problematic in the way an aggregate check would be over the entire table.

The other two are nice, but if PostgreSQL could do this very well, it would be the king of databases for accounting software, and I am sure they feature would find new uses elsewhere.

So, what's on your list?

18 comments:

AnonymousMarch 13, 2012 at 11:17 PM
Actually in progress already, and a big boost to your #3 is 1st class citizenship for plv8, for a few key reasons:

1) It's very fast, leveraging the smarties at google, and programming in plpqsql is limited compared to a prototype based language like javascript.

2) There are a lot more javascript programmers than pgsql, and that's not going to change.

3) It could very well be THE reason pg becomes more popular than mysql for web dev. Being a more touring complete database should be enough, but it's not. From my own experience, we have always used mysql for web apps because it's proven and we knew we could count on hiring web devs with experience with it. Everything has it's pros and cons, but it has worked very well for us. But one night not too long ago, plv8 caught our attention and we haven't looked back. I look forward to writing a blog post soon about using pg/plv8 for web apps over mysql.
ReplyDelete
Replies
Pavel StěhuleMarch 14, 2012 at 12:12 AM
@3 - do you know constrain triggers? http://postgres.cz/wiki/PostgreSQL_SQL_Tricks#Deferred_constraints

@2 - I am not sure about sense - used flat model is adequate to SQL - is simply and then is simple used.
ReplyDelete
Replies
Chris TraversMarch 14, 2012 at 2:26 AM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousMarch 14, 2012 at 11:26 PM
Chris, am I right in suggesting that what you want to do for #3 is something along these lines:

For accounting it is important that for double entry book keeping you can do things like make sure that all the accounts add up to zero all the time. So, in simplistic terms, "SELECT sum(balance) FROM account" would equal 0 at the point a commit is issued.

For small tables it's fine to run a complete check of all records but as the table grows to millions of rows that the performance penalty of this check becomes too large to execute at the table level.

Because the accounts always have to add up to zero this means that all the changes within a single commit also have to add up to zero, and that if there was a mechanism to check just the changes then this would be a far more efficient solution.

Currently it is only possible to either check the whole table at a time, or to pass _all_ changes to the "accounts" table through a store procedure. Whilst the latter works, it's an inelegant solution to a common problem.

I too am storing account information in a similar manner within Postgres so if we could declare a COMMIT wide constraint then that would certainly help with this scenario.
ReplyDelete
Replies
AnonymousMarch 14, 2012 at 11:30 PM
Forgive the horrible bastardisation of existing syntax, but something along these lines:

CREATE CONSTRAINT TRIGGER example
AFTER COMMIT ON accounts
FOR EACH CHANGED ROW sum(balance) = 0
ReplyDelete
Replies
Josh BerkusMarch 18, 2012 at 12:27 PM
Chris,

The issue with namespace nesting has been with us since SQL92. The basic issue is that the SQL committee chose a single level of namespacing. This drives our object identifier resolution, which means that additional levels of namespacing break things. For example, imagine that you have:

schema accounts
schema customer
schema customer/accounts
function accounts.balance()
function customer/accounts.balance()

... and all of those are in your search_path.

Now, if you call accounts.balance(), what do you get? Nested namespaces take something which was determinative in the current impemementation and either makes it non-determinative, or forces you to namespace-qualify everything, which would break a lot of code.

Saying that you can't have namespace name collisions isn't helpful either, since the whole purpose of nesting namespaces is to allow name collisions.

This issue comes up every time anyone discusses Oracle packages. Nobody has come up with a solution yet. It's a real problem, and one worth solving, but it's a hard problem.
ReplyDelete
Replies
TonyMarch 19, 2012 at 5:56 AM
As for the XML: please check http://www.pgxn.org/dist/pg_xnode/0.6.1/
It's a project that I'm trying to start. Still quite an early pre-release, but it already has function that you might like:

node[] xml.children(node)

where the function can be recursively applied on each item of the result array.

Any feedback is appreciated.
ReplyDelete
Replies

Add comment