Tuesday, August 21, 2012

O/R Modelling Part 1: Intro to PostgreSQL as Object-Relational Database Management System

This is a very brief intro to PostgreSQL as an object-relational database management system.   In future blog posts, we will look at more hands-on examples of these features in action.  Keep in mind these are advanced features typically used by advanced applications.

This is a very brief guide to the concepts we will be looking at more deeply in future posts, tying together in recipes and examples.   While PostgreSQL was initially designed to explore object-relational modelling possibilities, the toolkit today is somewhat different than it was initially intended, and therefore the focus of this series will be how to use PostgreSQL in an Object-Relational manner, rather than tracking the history of various components.

How is PostgreSQL "Object-Relational?"

The term Object-Relational has been applied to databases which attempt to bridge the relational and object-oriented worlds with varying degrees of success.  Bridging this gap is typically seen as desirable because object-oriented and relational models are very different paradigms and programmers often do not want to switch between them.  There are, however, fundamental differences that make this a very hard thing to do well.  The best way to think of PostgreSQL in this way is as a relational database management system with some object-oriented features.

By blending object-primative and relational models, it is often possible to provide much more sophisticated data models than one can using the relatively limited standard types in SQL.  This can be done both as an interface between an application and database, and as intra-query logic.  In future posts I will offer specific examples of each concept, explore how PostgreSQL differs from Oracle, DB2, and Informix in this area.

PostgreSQL is a development platform in a box.  It supports stored procedures written in entirely procedural languages like PL/PGSQL or Perl without loaded modules, and more object-oriented languages like Python or Java, often through third party modules.  To be sure you can't write a graphical interface inside PostgreSQL, and it would not be a good idea to write additional network servers, such as web servers, directly inside the database.  However the environment allows you to create sophisticated interfaces for managing and transforming your data. Because it is a platform in a box the various components need to be understood as different and yet interoperable.  In fact the primary concerns of object-oriented programming are all supported by PostgreSQL, but this is done in a way that is almost, but not quite, entirely unlike traditional object oriented programming.  For this reason the "object-relational" label tends to be a frequent source of confusion.

Data storage in PostgreSQL is entirely relational, although this can be degraded using types which are not atomic, such as arrays, XML, JSON, and hstore.  Before delving into object-oriented approaches, it is important to master the relational model of databases.  For the novice, this section is therefore entirely informational.  For the advanced developer, however, it is hoped that it will prove inspirational.

In object-oriented terms, every relation is a class, but not every class is a relation.  Operations are performed on sets of objects (an object being a row), and new row structures can be created ad-hoc.  PostgreSQL is, however, a strictly typed environment and so in many cases, polymorphism requires some work.

Data Abstraction and Encapsulation in PostgreSQL


The relational model itself provides some tools for data abstraction and encapsulation, and these features are taken to quite some length in PostgreSQL.  Taken together these are very powerful tools and allow for things like calculated fields to be simulated in relations and even indexed for high performance.

Views are the primary tool here.  With views, you can create an API for your data which is abstracted from the physical storage.  Using the rules system, you can redirect inserts, updates, and deletes from the view into underlying relations, preferably using user defined functions.  Being relations, views are also classes and methods.  Views cannot simply be inherited and workarounds cause many hidden gotchas.

A second important tool here is the ability to define what appear to be calculated fields using stored procedures.  If I create a table called "employee" with three fields (first_name, middle_name, last_name) among others, and create a function called "name" which accepts a single employee argument and concatenates these together as "last_name, first_name middle_name" then if I submit a query which says: 

select e.name from employee e;

it will transform this into:

select name(e) from employee e;

This gives you a way to do calculated fields in PostgreSQL without resorting to views. Note that these can be done on views as well because views are relations.  These are not real fields though.  Without the relation reference, it will not do the transformation (so SELECT name from employee will not have the same effect).

Messaging and Class API's in PostgreSQL


A relation is a class.  The class is accessed using SQL which defines a new data structure in its output.  This data structure unless defined elsewhere in a relation or a complex type cannot have methods attached to it and therefore can not be used with the class.method syntax described above.  There are exceptions to this rule, of course, but they are beyond the scope of this introduction.  In general it is safest to assume that the output of one query, particularly one with named output fields, cannot safely be used as the input to another.

A second messaging apparatus in PostgreSQL is the LISTEN/NOTIFY framework which can be used along with triggers to issue notifications to other processes when a transaction commits.  This approach allows you to create queue tables, use triggers to move data into these tables (creating 'objects' in the process) and then issuing a notification to another process when the data commits and becomes visible.  This allows for very complex and and interactive environments to be built from modular pieces.

Polymorphism in PostgreSQL


PostgreSQL is very extensible in terms of all sorts of aspects of the database.  Not only can types be created and defined, but also operators can be defined or overloaded.

A more important polymorphism feature is the ability to cast one data type as another.  Casts can be implicit or explicit.  Implicit casts, which have largely been removed from many areas of PostgreSQL, allow for PostgreSQL to cast data types when necessary to find functions or operators that are applicable.  Implicit casting can be dangerous because it can lead to unexpected behavior because minor errors  can lead to unexpected results.  '2012-05-31' is not 2012-05-31.  The latter is an integer expression that reduces to 1976.  If you create an implicit cast that turns an integer into a date being the first of the year, the lack of quoting will insert incorrect dates into your database without raising an error ('1976-01-01' instead of the intended '2012-05-31').  Implicit casts can still have some uses.

Inheritance in PostgreSQL


In PostgreSQL tables can inherit from other tables.  Their methods are inherited but implicit casts are not chained, nor are their indexes inherited.  This allows you develop object inheritance hierarchies in PostgreSQL.  Multiple inheritance is possible, unlike any other ORDBMS that I have looked at on the market (Oracle, DB2, and Informix all support single inheritance).

Table inheritance is an advanced concept and has many gotchas.  Please refer to the proper sections of the manual for more on this topic.  On the whole it is probably best to work with table inheritance first in areas where it is more typically used, such as table partitioning, and later look at it in terms of object-relational capabilities.

Overall the best way to look at PostgreSQL as an object-relational database is a database which provides very good relational capabilities plus some advanced features that allows one do create object-relational systems on top of it.  These systems can then move freely between object-oriented and relational worldviews but are still more relational than object-oriented.  At any rate they bear little resemblance to object-oriented programming environments today.  With PostgreSQL this is very much a toolkit approach for object-relational databases building on a solid relational foundation.  This means that these are advanced functions which are powerful in the hands of experienced architects, but may be skipped over at first.

Next week:   An intro to object-relational "classes."  We will build a simple class that represents items in inventory.


Forthcoming Posts:
  • An intro to object-relational "classes" looking at tables with extended functionality
  • Table inheritance in PostgreSQL (including how to solve key problems)
  • Composite and row types in tables, including references to other tables
  • Interlude:  Comparison to MySQL as a way of questioning how we think about databases (no flames!)
  • Nested Data Structures (including things to avoid)
  • Polymorphism in PostgreSQL 
  • Messaging in PostgreSQL between client applications
  • General Design Patterns and Anti-Patterns

6 comments:

  1. Thanks. Very interested to read the rest of this series.

    ReplyDelete
  2. The article is interestingly written and quite elaborated.The author has presented the detailed information about the various topics.
    sap project system

    ReplyDelete
  3. This can be a fantastic read for me personally, Must admit that you're one of the best bloggers I ever saw.Thank you for writing this informative article.
    Oracle Security

    ReplyDelete
  4. Nice posting and very interestingly, thank's

    Markotop PostgreSQL

    ReplyDelete
  5. I think, and Date seems to agree, that complex multivalued types can be atomic, even though they aren't scalar.

    ReplyDelete