Monday, September 15, 2014

PGObject Cookbook Part 1: Introduction

Preface


I have decided to put together a PGObject Cookbook, showing the power of this framework.  If anyone is interested in porting the db-looking sides to other languages, please let me know.  I would be glad to provide whatever help my time and skills allow.

The PGObject framework is a framework for integrated intelligent PostgreSQL databases into Perl applications.  It addresses some of the same problems as ORMs but does so in a very different way.  Some modules are almost ORM-like and more such modules are likely to be added in the future.  However unlike an ORM, PGObject mostly serves as an interface to stored procedures and whatever code generation routines will be added, these are not intended to be quickly changed.  Moreover it only supports PostgreSQL because we make extended use of PostgreSQL-only features.

For those who are clearly not interested in Perl, this series may still be interesting as it not only covers how to use the framework but also various problems that happen when we integrate databases with applications.  And there are people who should not use this framework because it is not the right tool for the job.  For example, if you are writing an application that must support many different database systems, you probably will get more out of an ORM than you will this framework.  But you still may get some interesting stuff from this series so feel free to enjoy it.

Along the way this will explore a lot of common problems that happen when writing database-centric applications and how these can be solved using the PGObject framework.  Other solutions of course exist and hopefully we can talk about these in the comments.

Much of the content here (outside of the prefaces) will go into a documentation module on CPAN.  However I expect it to also be of far more general interest since the problems are common problems across frameworks.

Introduction


PGObject is written under the theory that the database will be built as a server of information and only loosely tied to the application.  Therefore stored procedures should be able to add additional parameters without expecting that the application knows what to put there, so if the parameter can accept a null and provide the same answer as before, the application can be assured that the database is still usable.

The framework also includes a fairly large number of other capabilities.  As we work through we will go through the main areas of functionality one at a time, building on the simplest capabilities and moving onto the more advanced.  In general these capabilities can be grouped into basic, intermediate, and advanced:

Basic Functionality



  1. registered types, autoserialization, and autodeserialization.
  2. The simple stored procedure mapper
  3. Aggregates and ordering
  4. Declarative mapped methods

Intermediate Functionality


  1. The Bulk Loader
  2. The Composite Type stored procedure mapper
  3. The database admin functions

Advanced Functionality


  1. Memoization of Catalog Lookups
  2. Writing your own stored procedure mapper

This series will cover all the above functionality and likely more.  As we get through the series, I hope that it will start to make sense and we will start to get a lot more discussion (and hopefully use) surrounding the framework.

Design Principles


The PGObject framework came out of a few years of experience building and maintaining LedgerSMB 1.3.  In general we took what we liked and what seemed to work well and rewrote those things that didn't.  Our overall approach has been based on the following principles:
  • SQL-centric: Declarative, hand-coded SQL is usually more productive than application programming languages.  The system should leverage hand-coded SQL.
  • Leveraging Stored Procedures and Query Generators: The system should avoid having people generate SQL queries themselves as strings and executing them.  It's better to store them persistently in the db  or generate well-understood queries in general ways where necessary.
  • Flexible and Robust: It should be possible to extend a stored procedure's functionality (and arguments) without breaking existing applications.
  • DB-centric but Loosely Coupled:  The framework assumes that databases are the center of the environment, and that it is a self-contained service in its own right.  Applications need not be broken because the db structure changed, and the DB should be able to tell the application what inputs it expects.
  • Don't Make Unnecessary Decisions for the Developer:  Applications may use a framework in many atypical ways and we should support them.  This means that very often instead of assuming a single database connection, we instead provide hooks in the framework so the developer can decide how to approach this.  Consequently you can expect your application to have to slightly extend the framework to configure it.
This framework is likely to be very different from anything else you have used.  While it shares some similarities with iBatis in the Java world, it is unique in the sense that the SQL is stored in the database, not in config files.  And while it was originally inspired by a number of technologies (including both REST and SOAP/WSDL), it is very much unlike any other framework I have come across.

Next in Series:  Registered Types:  Autoserialization and Deserialization between Numeric and Math::BigFloat.

5 comments:

  1. Chris, what is the maturity level of PGObject?

    I'm expecting within the next few weeks to be starting on a sizeable new Perl+Postgres project for work and was wanting to have some kind of layer over the database to make it more Perlish but at the same time still close to the Postgres metal; I also wanted to avoid the complexity of ORMS like DBIC.

    I could make such a thing myself specific to the project but then your post reminded me that perhaps PGObject may be very helpful, assuming it is reasonably mature and has a decent test suite to assert correctness.

    Note that I was already anticipating generating the Postgres schema using custom Perl code that would also define corresponding Perl data types and in particular constraints that are enforced at both levels; it would also generate triggers for common auditing or whatever; the Perl code is essentially a factoring of the schema. While I still expect to do that, maybe PGObject would be useful for the generated code to employ to handle the details of interfacing with things in the running application.

    ReplyDelete
    Replies
    1. Darren, the maturity level is mixed.

      the most mature part is PGObject::Simple and PGObject::Simple::Role ("Simple" here refers to the type of data mapping, not that it is a simple version of the framework). I have it now working afaics with the LedgerSMB 1.5 codebase. This module allows a fairly loose coupling of the stored procedures to the PostgreSQL stored procedures, but it doesn't handle complex data very well. I.e. passing in a whole invoice at once is not readily possible. These are also based on about 5 years of experience with similar approaches in the LedgerSMB codebase. I would call them fairly mature.

      The least mature part is PGObject::Composite and PGObject::Composite::Role. It will likely be a week or two before I am using them in active development so I am not sure about whether I missed something in the API. The interface as written is tested however. These modules allow you to pass composite types in (at your option as JSON or tuple literals)

      Test coverage of the module ranges from about 75% on the least covered modules to 96%. I don't look at the code when I write the tests but looking at the coverage reports after the fact, the missing tests are trivial functions which just wrap interfaces to other modules.

      I am pretty happy with the test cases.

      Even without the framework I have two modules you may find very helpful:

      1. PGObject::Util::BulkLoad provides a declarative way to bulk load hashrefs into tables, either as inserts or upserts.

      2. PGObject::Util::DBAdmin provides a basic API around the PostgreSQL command line utilities, with error detection and some other nicities.

      Neither of those require the framework (nor does anything in the PGObject::Util namespace but what else needs parsers for tuple literals?).

      Delete
    2. On the whole I would say, if you are using PGObject::Simple or PGObject::Simple::Role, you have nothing to worry about maturity-wise. The code has been used in some significant codebases, is well tested, and is unlikely to change in ways that break more than the odd corner case.

      If you want to have objects which autoserialize into tuple literals (possibly including properties which are arrays of other tuples), you may want to wait until I have a little more experience using that part of the code....

      Delete
    3. Thank you, Chris. For my intended use with this particular project, I was just intending to use Postgres' basic built in types and generally avoid any user-defined types except maybe DOMAINs. Any use of composite types would be rare, and just where they're clearly a better design choice than separate related tables. Your ::Simple features sound like mainly what I am interested in at this stage, rather than composite type serialization/deserialization. This could evolve in the future though, as it would be very handy to be able to pull out or push in all the data for a transaction / small report in a single Postgres call, which in the general case means a composite data structure. I would also be interested in the ::Util offerings you mention. Thanks again.

      Delete
    4. Also wxPOS (a point of sale add-on for LedgerSMB) uses PGObject::Simple::Role and there are production deployments of that which have been around for the last six months or so.

      Delete